In my first blog entry titled How I Learned to Stop Worrying and Love Data - And How You Can Too I talked about who I am and why I think data is important. In this entry, we will explore what data is, why we need to take data seriously, and how we can start to sort through some of the hype around data. Reading a blog, no matter how good it is, will not by itself qualify you as a data expert. My goal is to provide a solid foundation to help you better understand what the experts tell you and to question when people make bold but not quite right assertions about data.
Let’s start with some data basics first. What is data? It might be easier to ask: What isn’t data? Data is all text, numbers, music, photos, video, etc. It is all the content and information that is created and stored around us. As I am writing this blog entry, I am creating more data. Data can be categorized as structured or unstructured depending largely on whether it is in a standardized format (e.g., has columns and rows formatted by common attributes, like in a spreadsheet) that makes it more readily accessible or structured. Structured data is more usable and thus more valuable, though there is far less structured data than unstructured data (because structured data has to be curated by someone and that takes a lot of work).
You might be thinking… there must be a lot of data out there. You are right! Current estimates put the total at about 175 zettabytes of data; an increase from 1 zettabyte in 2010. Yes, I know zettabytes looks like a word I made up, but I assure you it is a real thing.
Not to overwhelm you, but in numbers, 175 zettabytes looks like this:
175,000,000,000,000,000,000,000 (that is 21 zeros!)
(Chart Source: Statista, 2023 from a 2018 IDC Study)
So, we have an idea about data is and we know that there is a lot of it. What is next?
If you have attended any conferences on data recently you have likely heard excited declarations that: “Data is the new gold!” or “Data is the new oil!”. Yes, It is true that there is great value in what we can do with data. But, while this analogy to certain natural resources is correct in some ways, it is also not quite right. Let’s consider why.
You can use data about past activities to predict future events. For example, data on weather in Ohio, historical crop yields by date, historical commodity pricing, etc. can be used (ingested) by an Artificial Intelligence (AI) system to yield a prediction that will help farmers in Ohio to decide what crops to plant and on what dates. This is an exercise in transforming historical data to predict future results. With advanced computing and great quantities of data, these predictions can be done at a scale and speed that is far more accurate and valuable than was possible in the past. So, data is clearly valuable as an element of production, like a natural resource.
It is also true that exclusive access to certain data, like exclusive access to some natural resources, can be much more profitable than if the resource was shared. Say the manufacturer of an electronic printer gets exclusive feedback/data from that product, by way of a “phone home” feature, telling the manufacturer when the printer needs or is about to need ink. The manufacturer can and does (constantly!) use this data to offer ink refills or replacements to the users of the printer before any competitors have a chance to offer alternatives. This is not a matter of limiting access to avoid depletion of this data resource but it is an example of the fact that gaining exclusive access to a virtual resource like data can lead to market advantage. That is as beneficial as having exclusive access to more tangible resources (like oil or gold).
However, the natural resource analogy does not fully work with data. Resources like gold and oil are available in finite quantities and are depleted by usage. You and I cannot have the same gallon of gasoline in the tank of our separate vehicles nor can we simultaneously wear my gold wedding band. By contrast, data is a non-rivalrous resource.* You and I can both use the same data set sequentially or even simultaneously and derive great value from it without using it up or degrading the value of that data set for another user. To put it more directly, with data you can have your cake and eat it too! In that way, data is unlike and arguably far better than most traditional natural resource.
People who have data but who are not in a data business may be resistant to and even fearful about selling or sharing their data. With all the confusing hype in the industry, these data holders worry that they will deplete their dataset or degrade its value if they allow access. As a result, they hold onto their data waiting for it to appreciate in value or at least until they can get more clarity as to its value. That is like putting your money under a mattress. A dataset is generally most valuable when it is current (e.g., information on COVID 19 outbreaks in January of 2020 is less valuable as each month/year passes).
Access to and use of data is one of our greatest resources to solve the world’s problems, if done properly (more to come on this). As a start, we need better data education to help unlock the value of data through sharing and utilization and to cut through the hype. Lack of data education and awareness may actually end up having the greatest negative impact on data value for both the data holder and society.
Next time you hear “data is the new gold” or “data is the new oil”, hopefully you’ll be better positioned to consider how true that really is. Hope you all have a great International Love Your Data Week!
* Your word of the day! Non-rivalrous means that when one person uses a resource, it does not prevent others from also using it.
If you have questions about your data and your legal compliance programs for data, Mortinger & Mortinger LLC can help! Contact me directly at: steve@mortingerlaw.com