Friday, February 2, 2018

Data Is the New Oil, Maybe

From Digitopoly, January 17:

Is your data really oil?
[with Ajay Agrawal and Avi Goldfarb, originally published in HBR Online under the title “Is your company’s data actually valuable in the AI era?” , 17 Jan 2018. Their book, Prediction Machines, is coming out in April 2018].

AI is coming. That is what we heard throughout 2017 and will likely continue to hear throughout this year. For established businesses that are not Google or Facebook, a natural question to ask is: What have we got that is going to allow us to survive this transition?

In our experience, when business leaders ask this with respect to AI, the answer they are given is “data.” This view is confirmed by the business press. There are hundreds of articles claiming that “data is the new oil” — by which they mean it is a fuel that will drive the AI economy.

If that is the case, then your company can consider itself lucky. You collected all this data, and then it turned out you were sitting on an oil reserve when AI happened to show up. But when you have that sort of luck, it is probably a good idea to ask “Are we really that lucky?”

The “data is oil” analogy does have some truth to it. Like internal combustion engines with oil, AI needs data to run. AI takes in raw data and converts it into something useful for decision making. Want to know the weather tomorrow? Let’s use data on past weather. Want to know yogurt sales next week? Let’s use data on past yogurt sales. AIs are prediction machines driven by data.

But does AI need your data? There is a tendency these days to see all data as potentially valuable for AI, but that isn’t really the case. Yes, data, like oil, is used day-to-day to operate your prediction machine. But the data you are sitting on now is likely not that data. Instead, the data you have now, which your company accumulated over time, is the type of data used to build the prediction machine — not operate it.

The data you have now is training data. You use that data as input to train an algorithm. And you use that algorithm to generate predictions to inform actions.

So, yes, that does mean your data is valuable. But it does not mean your business can survive the storm. Once your data is used to train a prediction machine, it is devalued. It is not useful anymore for that sort of prediction. And there are only so many predictions your data will be useful for. To continue the oil analogy, data can be burned. It is somewhat lost after use. Scientists know this. They spend years collecting data, but once it has produced research findings, it sits unused in a file drawer or on back-up disk. Your business may be sitting on an oil well, but it’s finite. It doesn’t guarantee you more in the AI economy than perhaps a more favorable liquidation value.

Even to the extent that your data could be valuable, your ability to capture that value may be limited. How many other sources of comparable data exist? If you are one of many yogurt vendors, then your database containing the past 10 years of yogurt sales and related data (price, temperature, sales of related products like ice cream) will have less market value than if you are the only owner of that type of data. In other words, just as with oil, the greater the number of other suppliers of your type of data, the less value you can capture from your training data. The value of your training data is further influenced by the value generated through enhanced prediction accuracy. Your training data is more valuable if enhanced prediction accuracy can increase yogurt sales by $100 million rather than only $10 million....MORE