Sunday, August 4, 2013

"Big Data Takes Center Stage"

I was first exposed to the idea of Big Data in late 1999 by some guys who figured they could make a business out of aggregating personal consumption information and selling the results to marketers. This was not a new idea but their added wrinkle was paying the consumer for the info.

They needed to raise some money to market to the marketers and buy some hardware. Then two things happened: First, I realized that the company would be creating the science on the fly and because that is inefficient the computers they probably needed would be right at home at Lawrence Livermore or Sandia (which a quick look at the Top500 list from those days shows as the #1 and #2 supercomputers in the world. In the June 2013 list the Chinese have the top spot).

Second, the NASDAQ bubble burst in March 2000 and folks pulled in their horns (and credit lines) so fast that seasoned, VC-backed IPO's fell from 537 in 1999 to 91 in 2001. If relatively mature companies couldn't exit there was no chance for an A or B round.

Anyhoo, enough history, here's a guy who more than likely sat in meetings with the crew who built LLNL's 1999 vintage supercomputer, the ASCI Blue-Pacific SST, IBM SP 604e.

I may owe FT Alphaville's Izabella Kaminska a hat tip on this but I can't for the life of me remember the link. 

From Irving Wladawsky-Berger:
A few recent articles have expressed concerns that big data may be at the peak of inflated expectations in the so-called hype cycle for emerging technologies, and will soon start falling into the trough of disillusionment.  This is not uncommon in the early stages of a disruptive technology.  The key question is whether the technology will keep falling through the trough and be soon forgotten, or whether it will eventually move on toward the slope of enlightenment on its way to a long life in the plateau of productivity.  How can you tell which it is going to be? 

In my experience, a disruptive technology will succeed if it can keep attracting serious researchers and analysts, who will, over time, cut through the hype and bring discipline to its development and marketing, coming up with solutions to the many technical obstacles any new innovation encounters, sorting through its unrealistic promises and reframing the scope and timelines of its objectives.  The Internet recovered from the hype that led to the dot-com bubble and has gone on to a highly successful future.  Cloud computing is now going through a similar period of serious evaluation and development.  So is big data.

In The Rise of Big Data: How It's Changing the Way We Think About the World, an article just published in Foreign Affairs, Economist editor Kenneth Cukier and Oxford professor Viktor Mayer-Schönberger do a very nice job in articulating why “big data marks the moment when the information society finally fulfills the promise implied by its name.”  The article is adapted from their book Big Data: A Revolution That Will Transform How We Live, Work, and Think published in March, 2013. 

Cukeir and Mayer-Schönberger explain that big data has risen rapidly to the center stage position it now occupies for the simple reason that there is so much more digital information now floating around than ever before.  In 2000, only one-quarter of the world’s stored information was digital and therefore subject to search and analysis.  Since then, the amount of digital data has been doubling roughly every three years, so by now only two percent of all stored information is not digital.

Big data could not have possibly come into being without the digital revolution which thanks to Moore’s Law has made it possible to drastically lower the costs of storing and analyzing those oceans of information.  The Web has also made it much easier to collect data, as has the explosive growth of mobile devices and smart sensors.  “But, at its heart,” the authors write, “big data is only the latest step in humanity’s quest to understand and quantify the world.”  Datafication is the term they use to describe the ability to now capture as data many aspects of the world that have never been quantified before.    

I totally agree with their view that big data should not only be framed as part of the digital and Internet revolution of the past few decades, but also as part of the scientific revolution of the past few centuries.  At the 2013 MIT Sloan CIO Symposium this past May, MIT professor Erik Brynjolfsson made a similar point at the panel he moderated on The Reality of Big Data when he observed that throughout history, new tools beget revolutions.  

Scientific revolutions are launched when new tools make possible all kinds of new measurements and observations.  Early in the 17th century, Galileo made major improvements to the recently invented telescope which enabled him to make discoveries that radically changed our whole view of the universe.  Over the centuries we’ve seen that new tools, measurements and discoveries precede major scientific breakthroughs in physics, chemistry, biology and other disciplines    

Our new big data tools have the potential to usher an information-based scientific revolution.  And just like the telescope, the microscope, spectrometers and DNA sequencers have led to the creation of new scientific disciplines, data science is now rapidly emerging as the academic companion to big data.  One of the most exciting part of data science is that it can be applied to just about any domain of knowledge, given our newfound ability to gather valuable data on almost any topic, including healthcare, finance, management and the social sciences. But, like all scientific revolutions, this will take time.

According to Cukeir and Mayer-Schönberger, datafication requires three profound changes in how we deal with data.  The first is what they call n=all, that is, collecting and using lots of data rather than settling for small samples, as statisticians have done until now.  “The way people handled the problem of capturing information in the past was through sampling.  When collecting data was costly and processing it was difficult and time consuming, the sample was a savior.  Modern sampling is based on the idea that, within a certain margin of error, one can infer something about the total population from a small subset, as long the sample is chosen at random.”...MUCH MORE