Welcome again to the Data Connections blog! My last Blog post explored the not as simple as it sounds question of where data comes from. If you have not already read it, you may want to start here before reading this entry. In this entry, we’ll take the next step and start to understand how data is connected by looking at what I like to call the four-legged data stool.
Throughout our lives, there are connections. Oftentimes the connections are hard to spot if we take a narrow view. This became clear to me, not only with data, but also as I pursued my passion for running.
Like many runners, I have suffered injuries over the years. A few years ago, I had a foot injury that led to surgery. Over time, I finally got back to where I was running comfortably with little or no pain in my foot. However, I then started to feel new pain in my back and neck. “How can this be, I thought? My foot is fine, and I just don’t get why I now have this new, unrelated pain!” So, I spoke to my running mentor and good friend Coach Mike to get some advice. He told me he noticed that I changed my stride after the injury and that new motion was likely the cause of the back and neck pain. “It’s all connected” he said, “our bodies are an intricate machine and if you change any part of how you are moving, you are very likely to see a reaction in another (seemingly unrelated) part of that machine.” This is a great insight and I share it with you because it is also applicable to data.
The connections between data elements are significant but, as with my recognition of the source of my neck and back pain, the connections between these elements can be hard to spot. This is especially true in the case of data since its elements have often been organizationally siloed. Siloing can obscure the connection points.
In this Blog post, we will begin with a basic overview of the four legs of the data stool: privacy, cybersecurity, artificial intelligence, and governance and look at how each of these legs is connected to the other legs. This will set us up for a further exploration of each of these elements and the connections between them in upcoming Blog posts.
Privacy: From a public awareness and corporate compliance standpoint, many of us first started to think about data privacy as the European Union (EU) passed the Data Protection Directive in 1995. This was replaced by the much-publicized General Data Protection Directive (GDPR) in 2018. In both cases, multinational organizations doing any business in the EU were forced to think about compliance with privacy regulations related to personal data in a serious way, often for the first time.
By contrast, the US has no comprehensive federal privacy law and instead has an often-confusing patchwork of over 50 state and local privacy laws. To ensure compliance with these many laws, organizations have created a Privacy Office (potential data silo #1) and Chief Privacy Officer (CPO) to manage their privacy policy and programs.
Cybersecurity: Another key touchpoint in data awareness – cybersecurity - resulted from the rise of cybercrimes and theft of data. This theft is a big deal. Cybersecurity Ventures estimates cybercrimes will cost the world economy $10.5T by 2025.[ii] The concern over cybercrime is heightened by the privacy laws we just discussed and their requirement for protection of personal data.
The impact of cybercrimes goes far beyond personal data though as we have seen cyber criminals hold all the data of an entity hostage until a ransom is paid. Fear over the loss of productivity and the reputational damage caused by being the victim of a cybercrime is significant. Who wants to entrust their business or personal data to a company with weak (actual or perceived) data protection? This has led to state and federal laws requiring organizations to create an information security program to protect data. In a high risk industry such as financial services, the Gramm -Leach-Bliley Act requires companies that offer consumers financial products or services explain their information-sharing practices to their customers and to take steps to safeguard sensitive data.[iii]
These cybersecurity laws typically also require some form of timely notice is given to individuals whose data has been compromised in a cyberattack so they can attempt to respond to the loss.[iv] To address protection of data and compliance with these regulations, organizations have created a Cybersecurity team (potential data silo #2) typically reporting to the Chief Information Officer (CIO).
Artificial Intelligence (AI): The next data awareness moment for most organizations has come from the constant drumbeat on use of AI. AI is very much a data issue as:
AI = an algorithm[v] + (lots of) data.
Who hasn’t heard about how ChatGPT or DALL-E is going to take our job… or potentially take over our world in the distant or not so distant future? Like privacy or cybersecurity, the concerns over usage of AI are now driving discussion about regulating AI. The recently enacted EU AI Act,[vi] which takes a risk-based approach to regulating the creation of AI, is the largest example. Under this approach, the highest risk uses of AI (e.g., assessing the risk of an individual committing criminal offenses) either get rejected completely or get the highest regulatory coverage. Lower risk uses, as self-identified by the creator of the AI, like providing telephone response services for customer inquiries to retailers, get little or no regulatory coverage… for now.
We are still in the early days of AI governance. That said, most companies are not creating AI but rather figuring out how to get it and use it productively, which also includes some compliance obligation. So, adding another group to our organizational data analysis, typically the Procurement team (potential data silo #3) led by the Chief Procurement Officer (CPrO) is looking at an organization’s AI usage.
Data Governance: Lastly in focus and priority for many organizations is data governance. This leg of the data stool is possibly the least exciting for most people, but it is arguably the most important. Data governance makes all data usage come together better, but by itself it is not a headline grabber.
Data governance can be many things, but I will focus on using data governance to set standards for how data is created and brought into an organization. Data governance typically includes rules, procedures and processes for data within an organization to ensure: 1) it is acquired from reputable/quality sources; 2) it is tagged with metadata[vii] so it can be found for compliance, re-use[viii], and subjected to lifecycle management [ix], 3) it can be segmented and protected and, in the case of a cyberattack, to actually quantify what has been lost; 4) the data you have is related to your business needs and not embarrassing to you (e.g., it does not contain pornography, hate speech, advocacy of violence); and 5) it can be treated like any other valuable asset of your organization and not just left to chance. Speaking of connections, we discussed this last item (#5) in Blog Post #3.
Data governance is typically a data process managed by the Data office within an organization (potential data silo #4) under the Chief Data Officer (CDO).
It is not necessarily a bad thing to have separate groups responsible for the different elements of data; however, they must recognize the connections between data and closely coordinate. Where organizations are large and different organizational silos have different measurements, priorities, and leaders, this coordination can be difficult. The CPO, CIO, CDO and CPrO may not all come together in an organization chart under a single leader below the Chief Executive Officer. This separation can make it difficult to see and nimbly deal with risks associated with the constantly changing data environment as well as to properly manage data as an organizational asset.
Connections: When an organization coordinates all its data processes, it is at its strongest. The data governanceprocess can be the gatekeeper for the organization’s data compliance requirements. Data governance keeps the bad data out and the crown jewel data in (like personal information, customer lists, trade secrets, etc.) by reviewing inbound and outbound data so data flow is intentional. Data management, under the governance umbrella, also enables tagging data with metadata and indexing data so all access is controlled and so it can be tracked and found. This helps to simplify cybersecurity which relies on understanding and protecting the organization’s important data and limiting/controlling data access. It is not much of a leap to see how both cybersecurity and data governance promote, connect to, and go hand-in-hand with privacy by enabling protection of the personally identifiable data that is entrusted to an organization by its employees, its suppliers, and its customers.
For AI, the connections with other data elements are also critical. If you are developing AI, you use data governance to review inbound data to ensure you are getting good data[x] from trusted sources to feed the AI product you are creating. Cybersecurity helps you ensure your work is not shut down by hackers seeking to contaminate your data (more to come on this in a future Blog post) or hold hostage the data critical to your product development.
Even if you are only using AI as a customer, your data governance process should be reviewing the AI providers’ data sources to ensure they are consistent with your own data policies before AI is implemented. Your cybersecurity team should likewise be reviewing the AI providers’ security procedures to ensure you are not creating holes in your security when using that third party AI as a tool.
As Coach Mike told me. “it’s all connected”. The next time you feel a pain in your data process, make sure organizational silos are not keeping you from seeing the data in your organization in a holistic way. The problems you experience in your AI usage, for example, might be tied to issues with your data intake process. Taking a broader view will enable your organization to maximize the value of data and help minimize risk.
Now that we have an idea of why data is important, how it can be used for good (or not), where data comes from, and how data is connected, we are ready to dive deeper into the 4 legs of the data stool. But first, in my next blog entry, we’ll divert briefly to talk about some interesting insights I gained at a recent symposium on AI regulation.
I hope you’ll be back as we continue to explore data connections!
Here’s that data stool one more time with all the “legs” filled in. [i]
_________________________________________________________________
[i] If you have already read my Blog Post #2, you’ll understand the challenges of creating the perfect AI image. So, in this post I have 2 images of the data stool, since I could not get Microsoft Image Creator to come up with the single image I really wanted!
[iii]https://www.ftc.gov/business-guidance/privacy-security/gramm-leach-bliley-act
[iv] See, for example, the NY State SHIELD Act: https://ag.ny.gov/resources/organizations/data-breach-reporting/shield-act
[v] An algorithm, per Wikipedia, is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. https://en.wikipedia.org/wiki/Algorithm
[vi] Here is a helpful summary of the EU AI Act: https://artificialintelligenceact.eu/high-level-summary/
[vii] Metadata is data about data. Metadata can be included with a dataset to index the dataset and describe the data contained in that dataset. It can make use of the data easier. https://en.wikipedia.org/wiki/Metadata
[viii] I have encountered instances of organizations purchasing the same data multiple times because there is no central tracking of data and data purchases in the organization -- so no one was aware that they already had the data.
[ix]Once you understand and track the data you bring into your organization you can implement automated reminders to renew applicable contracts for the data, to delete the data when it is no longer needed (or when it is contractually mandated) and even to check on appropriate usage of the data. Automated lifecycle management is a major benefit for data compliance teams.
[x] “Good data” includes data that has been reviewed before intake to ensure it meets your organization’s data policies and procedures and that has been screened to ensure it will not embarrass you.
_______________________________________________________________________________
If you have questions about your data and your legal compliance programs for data, Mortinger & Mortinger LLC can help! Contact me directly at: steve@mortingerlaw.com
Mortinger & Mortinger LLC: When Experience is Important and Cost Matters
Data and Intellectual Property Law Services
Fantastic article.
Thanks, and thanks for that advice!