Thursday, June 16, 2016

‘Big data is people!’

Sticking with the data theme* that seems to be emerging this week.
I wish I could take credit for the headline but it comes from Aeon magazine as does the rest of this post:

The sum of our clickstreams is not an objective measure of who we are, but a personal portrait of our hopes and desires 
Header essay 78392867 high
 All those happy moments, recorded. Photo by Harald Sund/Getty
We live in what is sometimes called the ‘petabyte era’, and this pronouncement has provoked much discussion of the sheer size of data stores being created, as well as their rapid growth. Claims circulate along the lines of: ‘Every day, we create 2.5 quintillion bytes of data – so much that 90 per cent of the data in the world today has been created in the last two years alone.’ This particular statistic comes from IBM’s website under the topic: ‘What is Big Data?’ but similar ones appear regularly in the popular media. The idea has impact. Among other things, it is used to initiate a conversation in which an IBM representative, via a pop-up entreaty, offers big-data services. Merely defining big data, it seems, generates more opportunities for big data.

And the process continues. Ever more urgently in the press, in business and in scholarly journals the question arises of what is unique about big data. Often the definitions are strangely circular. In 2013, a writer for the Columbia Journalism Review described big data as ‘a catchall label that describes the new way of understanding the world through the analysis of vast amounts of data’ a statement that amounts to: big data is big… and it’s made of data. Others talk about its transformational properties. In Wired magazine, the tech evangelist Chris Anderson claimed the ‘end of theory’ had been reached. So much data now exists that it is unnecessary to build a hypothesis to test scientifically. The data can, if properly handled and analysed, ‘speak for themselves’. Many resort to definitions that stress the ‘three Vs’: a data set is ‘big data’ if it qualifies as huge in volume, high in velocity, and diverse in variety. The three Vs occasionally pick up a fourth, veracity, which can be interpreted in a number of ways. At the least, it evokes the striving to capture entire populations, which opens up new frontiers of possibility.

What is often forgotten, or temporarily put aside, in such excited discussions is how much of this newly created stuff is made of and out of personal data, the almost literal mining of subjectivity. In fact, the now common ‘three Vs’ were coined in 2001 by the industry analyst Doug Laney to describe key problems in data management, but they’ve become reinterpreted as the very definition of big data’s nearly infinite sense of applicability and precision.

When introducing the topic of big data in a class I teach at Harvard, I often mention the Charlton Heston movie Soylent Green, set in a sci-fi dystopian future of 2022, in which pollution, overpopulation and assisted suicide are the norm. Rations take the form of the eponymous soylent-green tablets, purportedly made of high-energy plankton, spewed from an assembly line and destined to feed the have-nots. Heston’s investigation inevitably reveals the foodstuff’s true ingredients, and such is the ubiquity of the film’s famous tagline marking his discovery that I don’t think spoiler alert applies: Soylent green is people! 

Likewise, I like to argue, if in a different register: ‘Big data is people.’...
is an associate professor of the history of science at Harvard University. Her latest book is Database of Dreams: The Lost Quest to Catalog Humanity (2015).
*McKinsey: Monetizing Freely Available Data Worth $3.2-$5.4Trillion per Year