Happy New Year to you all… and yes we are back again in 2025 with another edition of the Data Connections Blog!
If you have ever tried to visit a zoo with 10 unruly children, you probably have some sense of what chaos looks like. “I’m hungry” “I need to go to the bathroom” “Timmy is hitting me” “I wanna see the penguins” You know the drill. It is hard, if not impossible, to get everyone going in the same direction, to keep from losing anyone, all while dealing with endless side trips to the bathroom, skinned knees, and fatigue induced crankiness (not to mention the children). So, imagine trying to wrangle the world’s estimated 200,000,000,000,000,000,000,000 unruly bytes (200 ZB) of data, each with their own personality[1].
Or course, like children, data is not a monolith. There is well behaved data (structured) and there is unruly data (unstructured). Structured data comes in a standardized data format enabling ease of storage, management and usage (e.g., contact lists, CRM systems and data in other spreadsheet style formats). By contrast, unstructured data is information in many different forms that doesn’t follow conventional data rules/format in how it is stored (e.g., Microsoft Word documents, social media sites). It is hard to find and hard to use. Unfortunately, there is far more unstructured data in the world than there is structured data… and unstructured data is difficult to pin down.
I once had a C-suite executive insist that “we know where ALL our data is” (full stop). That is a nice thought and may be true at some highly-extracted level… but no one who works with data would realistically believe that you can know exactly where each byte of data is in an organization of any size.
The prevalence of unstructured data, coupled with the fact that many data users simply have not realized the critical importance of controlling data, is why data governance is the fourth leg of the data stool I talked about way back in Blog Entry #6. While I characterize data governance as a “leg” of the data stool, it is really the glue that holds the rest of the legs together as we’ll discuss.
Most organizations use only about 12% of the data that they have and would like to use more… but simply don’t know how[2]. In this, and upcoming blog entries, I will address that. We’ll talk about the importance of data governance, what it covers and what implementing a data governance plan can achieve for your organization.
If you’re in the “I don’t even know where to start” phase, that’s fine. The first step is to assign someone(s) to be responsible for governing the data.
In a larger organization, that would look something like this:

A data focal point for the organization is critical. In the graphic above, and in most larger organizations, that role is filled by the Chief Data Officer (CDO) and their staff. Some sort of Governing Body, typically composed of the CDO and other senior executives, would then create the data rules and data definitions for the organization (more on this below). The Steering Committee is a working group of data users that has the task of guiding implementation of the data rules. The real “feet on the ground” are the Data Stewards who are individuals working within the different parts of the organization to educate people on the data rules and to help implement them in day to day scenarios.
Does this sound like a lot of structure? Is it all necessary? Yes, and maybe. Let me try to explain why a centralized data team is so important for an organization. To start, if data structures are disaggregated, silos start to grow following the separate structures within an organization.
Data silos in different parts of an organization can cause at least 3 major issues: 1) lack of communication about data; 2) inefficient use of data and 3) lack of control over data. Lack of communication is a foundational issue and both contributes to the other issues and can have a negative/cascading impact across the organization.
We’ll start with the communication issues caused by data silos. In one of my earliest discussions with corporate data clients (from across many divisions of a company), I learned how data communication can go astray. We had a lengthy, and seemingly productive, discussion and came to agreement on many of our data goals. When I later saw a summary of the meeting, it didn’t fit my recollection at all. I soon realized that throughout the meeting we had used the same words but with different meanings to the participants from different groups. We had to go back and reopen the discussion, and our productivity diminished significantly.
At the most basic level, you need organization-wide data definitions. A very basic example: when you refer to “temperature” in data fields, you need to know if that means in Celsius or Fahrenheit across the organization. Similarly, when you refer to time, you need to know whether that means military time (one 24 hour cycle per day) or regular time (two 12 hour cycles per day). While these may seem trivial, imagine the impact to a medical organization that builds its processes and patient care around values like patient temperatures.
Next, silos can also cause inefficiency. In my own household, I have ventured out to buy lightbulbs only to learn that another member of my family had purchased said lightbulbs but had not mentioned it … then put them somewhere I didn’t know to look (Dang!). This may, as I am told by my wonderful wife and law partner, result at least in part due to my lack of patience for searching all over the house for them![3]
Likewise, in my clients’ organizations, lack of communication and coordination among groups (i.e., siloing) have led to the same data set being acquired multiple times by different parts of the organization. They didn’t have any way to know the data they wanted was already in their organization and there was no cross-organization database or other common means of communication. That is a much more costly (and eminently fixable) problem!
Finally, lack of central coordination causes lack of control. In many organizations, anyone with a corporate credit card can go on-line (to GitHub or otherwise) and buy a dataset, or more frequently, download one for free. If that is the case, how do you know if that dataset meets your organization’s requirements for: format, quality, content, etc. More importantly, how can you understand any organizational obligations that come along with possession or use of the data?
A classic case is downloading a dataset with terms that prohibit commercial use. Then, when the data it is fully embedded in some about-to-be launched product, the unfortunate restriction must be negotiated… with the dataset owner then having ALL the leverage.
Possibly worse is when users download unruly datasets that are not consistent with your organization’s operational purpose and that could come back to embarrass you. Pornography, profanity, advocacy of violence, and hate speech are just a few examples. While these may have free speech protections outside the corporate firewall, you may/should restrict them in the workplace for legitimate reasons – if only you had some centralized way of doing that.
This seems like a particularly good time for a cliffhanger! So, check back soon and we will dive into how to wrangle all that unruly data. Like a parenting class, a solid data governance approach is not magic or instantaneous, but it will give you the skills to better manage unruly data and to get the maximum value from that data in return.
I hope you’ll be back next time as we continue to explore these issues and learn more about data connections!
If you have questions about your data and your legal compliance programs for data, Mortinger & Mortinger LLC can help! Contact me directly at: steve@mortingerlaw.com

Footnotes:
Cybercrime Magazine online, February 2024 last accessed (1-21-25): https://cybersecurityventures.com/the-world-will-store-200-zettabytes-of-data-by-2025/ ↑
Forrester Research Study, 2012. ↑
Editor’s note: It may also result, at least in part, from said editor’s lack of consistency regarding light bulb storage. ↑
Unruly children may be easier. Having a chief data officer seems like a good starting point, even in our home.
I couldn’t agree more!