From MIT's Technology Review:
A computer science professor uses textual analysis of articles from Yahoo Finance to beat the market.
The ability to predict the stock market is, as any Wall Street quantitative trader (or quant) will tell you, a license to print money. So it should be of no small interest to anyone who likes money that a new system that works in a radically different way than previous automated trading schemes appears to be able to beat Wall Street's best quantitative mutual funds at their own game.
It's called the Arizona Financial Text system, or AZFinText, and it works by ingesting large quantities of financial news stories (in initial tests, from Yahoo Finance) along with minute-by-minute stock price data, and then using the former to figure out how to predict the latter. Then it buys, or shorts, every stock it believes will move more than 1% of its current price in the next 20 minutes - and it never holds a stock for longer.
The system was developed by Robert P. Schumaker of Iona College in New Rochelle and and Hsinchun Chen of the University of Arizona, and was first described in a paper published early this year. Both researchers continue to experiment with and enhance the system - more on that below.
Using data from five non-consecutive weeks in 2005, a period chosen for its lack of unusual stock market activity, here's how AZFinText performed versus funds that traded in the same securities (which were all chosen from the S&P 500):
And here's how it performed compared to the top 10 quantitative mutual funds in the world, all of which draw from a much larger basket of securities, except of course for the included S&P 500 itself:
Software that analyzes textual financial information - quarterly reports, press releases, news articles - is nothing new. Researchers have been publishing on the subject since at least the mid-1990's.
However, previous approaches to this technique were hampered by either poor performance (averaging little better than chance) and / or requirements for unreasonable amounts of computational horsepower. Schumaker and Chen get around these issues by first radically shrinking the amount of text their system has to parse by boiling down all the financial articles the system ingests into words falling into specific categories of information....MORE