Friday, July 3, 2020

Charles Darwin's Cousin and the Beginnings of Big Data

From Delancey Place:
Today's selection -- from A Brief History of Everyone Who Ever Lived by Adam Rutherford.
The beginnings of big data in the 1800s:
"After reading his cousin's [Charles Darwin's] masterwork, [Francis] Galton began pondering whether humankind could be improved by selective breeding. Darwin was a focused scientist compared to Galton, though that title did not exist until 1834. The somewhat arbitrary subject areas of science that we cling to in school today were not so rigid back then, and most dabbled in multiple fields. Darwin was preoccupied with other living things as well as his pigeons, particularly worms, carnivorous plants, and barnacles, though he was also driven by geology, which was critical to the development of his evolutionary thinking.
Galton, by comparison, was more a polymath, and made not insignificant contributions to a whole range of fields. His myr­iad gifts to the world included the first newspaper weather map, the scientific basis of fingerprint analysis for forensics, a dizzying number of statistical techniques, many the underpinnings of all statistics used today, foundational work on the psychology of syn­esthesia, a vented hat to help cool the head while thinking hard; and much else over his long and distinguished career. He also gave us the word eugenics, more of which later, and the phrase nature versus nurture, which has plagued geneticists ever since, as this whole book I hope makes amply clear. He devised a new way to cut cakes, which was published in Nature, the journal from which both the structure of DNA and the first human genome would break out of the lab and enter the public consciousness. ...

"Galton did a couple of years of medical training in Birmingham and London, but went on to pursue mathematics, and the world of numbers would be the prime determinant of his intellectual legacy.
"Over his long and varied career, one thing was consistent among Galton's traits: He coveted data. He measured. It was in the statis­tics that he developed, and in his unquenchable thirst for measur­ing human characteristics, that he tried to formalize and lock down human differences. In Chapter 4 and elsewhere in this book, we explored the new business of genetic ancestry, where for around £100 and a froth of spit in a test tube, one of many companies will draw a sketch of your DNA. The results are, to my mind, of incon­sequential interest to an individual, but in collecting these samples the companies behind them, notably 23andMe, are amassing colos­sal datasets of human genomes in numbers that far outstrip ones available to academic scientific research.

"Galton had done it all before. He recognized the power of large collections of measurements -- we call it 'big data' nowadays -- and cannily also recognized our own fascination with ourselves, and willingness to reach into our purses to satisfy those egos....
....MORE