A repost from August 2018 that will lead into some current [2024] challenges.
Coming right up
Original post:
This piece makes a good introduction to a looming problem at the intersection of big data and biology. Back with that in a bit.
From AgFunder:
Editor’s Note: Joseph Byrum, a geneticist, is chief data scientist at Principal Financial Group and a regular contributor to AgFunderNews. Full disclosure: Byrum has no connection with Guinness or any other beermaker besides having a pint on occasion. Connect with him on Twitter @ByrumJoseph and read his other articles here.
Amazon, Google, Microsoft, and every other firm heavily invested in big data all owe their success to beer. The pioneering work of Irish brewers a century ago made today’s big data and artificial intelligence gold rush possible. And the advances in genetics, technology, and mathematics may well be the key to humanity’s survival.Back in 1899, William Sealy Gosset would have no idea how important his work would prove to be. He turned up on the doorstep of the Dublin headquarters of Guinness one day in 1899, hoping to secure an apprenticeship. As a newly graduated Oxford University chemist, Gosset was a natural. He began what turned out to be a 38-year career devoted to perfecting a pint of stout. Brewing, agriculture, and the entire field of statistics would never be the same again.Gosset began with a singularly practical mission: finding a way to achieve a consistently high-quality beer at a lower overall cost. This was no small task at the world’s largest brewery, which at the time pumped out 100 million gallons of beer annually.The basics of beer are simple. The flavor and aroma come from yeast, malt, and hops, the ingredients responsible for the difference between a Belgian ale or an American wheat, a Munich lager or an English stout. Guinness knew that it could use breeding techniques to improve its ingredients, but it would have to conduct tests to know how well or poorly newly-developed hybrid barley varieties performed.Guinness took a practical approach to finding the best and the most economical variety. It would conduct experiments in a way that provided only enough certainty to make the brewery more profitable. This was heresy to the statisticians of the time, who insisted on maximizing the number of observations to reduce error to an absolute minimum. Conducting a lot of experiments, however, was expensive, as setting up testing fields was a labor-intensive process. Gosset cut out as many of these costs as he could by extracting useful results from the smallest possible set of observations.
In an early experiment, Gosset wanted to use ingredients that would hit the desired alcohol content without triggering the higher taxes that Ireland imposed on stronger brews. So he tested the saccharine level of the malt extract and figured that customers and regulators alike would be fine if he could achieve his saccharine target with an accuracy of plus or minus five-tenths of a degree.
By making just two observations, he could get it right 80% of the time. If the number of observations increased to four, the results would dramatically improve so that he got it right 12 out of every 13 tries.By contrast, to achieve near-perfect laboratory precision meant conducting experiments with 82 observations — a level of perfection that proved to be prohibitively expensive.Gosset was comfortable with the level of confidence achieved with four observations, a sample size small enough to save big cash, using a statistical method that opened a new window on the use of mathematics to do more with less.One of the biggest problems in testing the significance of experimental results in the fields where plants are grown is that environmental noise creeps into the results in a way that doesn’t happen in the chemist’s laboratory. Out in nature, weather, soil conditions, insects, disease and so many other sources of potential error skew the results.Gosset’s theories of statistical significance and balanced experimental designs helped manage this problem. To keep Guinness competitors from realizing the value of statistics in brewing a better beer, his work was published under the pen name “Student.”Ronald A. Fisher, a professional mathematician, absorbed and systematized Student’s theories in a way that has stood the test of time. Fisher’s statistical books are still widely used today and have only recently been supplanted as the foundation of modern agriculture.
Only so much can be done when data are stored in hand-written journals or boxes jammed to the brim with typewritten sheets of paper. There is simply no way to perform complex simulation and analysis without powerful computer hardware and software suites. Advances in computer hardware have allowed agricultural statistics to escape the limitations of paper and leap beyond Gosset and Fisher.
Understanding the genomes of yeast, barley, and hops allows for customizing the flavor and attributes of a beer. A geneticist’s job is to harness this understanding to take advantage of genetic variation. The selective breeding and genetic modification that took place at Guinness in the preceding century ensured the beer was more consistent. Today, we can do the same with unprecedented precision.
Update: "Unless researchers solve the looming data storage problem, biomedical science could stagnate"Precision agriculture uses sophisticated sensors to gather real-time data about soil conditions, plant health, the weather, and every other factor vital to a successful yield. Sophisticated algorithms evaluate millions or trillions of possible genetic combinations to design better plants. Even more complex models are used to determine when, where and how these seeds will be planted for maximum growth, taking into account the measured conditions.Agriculture informed by data analytics is all about growing more from less, embracing Gosset’s economizing quest for a better beer. Gosset figured out how to conduct a series of more efficient trials. Now we’ve figured out how to use data analytics to further slash the number of trials needed. ...MORE