Kira Radinsky Eric Horvitz
Technion–Israel Institute of Technology Microsoft Research
Haifa, Israel Redmond, WA, USA
kirar@cs.technion.ac.il horvitz@microsoft.com
ABSTRACT
We describe and evaluate methods for learning to forecast forthcoming events of interest from a corpus containing 22 years of news stories. We consider the examples of identifying signi cant increases in the likelihood of disease outbreaks, deaths, and riots in advance of the occurrence of these events in the world. We provide details of methods and studies, including the automated extraction and generalization of sequences of events from news corpora and multiple web resources. We evaluate the predictive power of the approach on real-world events withheld from the system.
1. INTRODUCTION
Mark Twain famously said that \the past does not repeat itself, but it rhymes." In the spirit of this re ection, we develop and test methods for leveraging large-scale digital histories captured from 22 years of news reports from the New York Times (NYT) archive to make real-time predictions about the likelihoods of future human and natural events of interest. We describe how we can learn to predict the future by generalizing sets of speci c transitions in sequences of reported news events, extracted from a news archive spanning the years 1986{2008. In addition to the news corpora, we leverage data from freely available Web resources, including Wikipedia, FreeBase, OpenCyc, and GeoNames, via the LinkedData platform [6]. The goal is to build predictive models that generalize from speci c sets of sequences of events to provide likelihoods of future outcomes, based on patterns of evidence observed in near-term newsfeeds. We propose the methods as a means of generating actionable forecasts in advance of the occurrence of target events in the world.
The methods we describe operate on newsfeeds and can provide large numbers of predictions. We demonstrate the predictive power of mining thousands of news stories to create classi ers for a range of prediction problems. We show as examples forecasts on three prediction challenges: proactive alerting on forthcoming disease outbreaks, deaths, and riots. These event classes are interesting in serving as examples of predictions that can serve as heralds for attention for guiding interventions that may be able to change outcomes for the better. We compare the predictive power of the methods to several baselines and demonstrate precisions of forecasts in these domains ranging from 70% to 90% with a recall of 30% to 60%....MUCH MORE (10 page PDF)
HT: naked capitalism