Saturday, April 18, 2026

"The End of Market Intelligence and the Last Analyst"

From Arpitrage, the substack of Professor Arpit Gupta, March 19:

The escalating arms race in text analysis, and whether you can simulate your customers

This is the fourth installment of my course summaries from teaching AI in Finance at NYU Stern (lecture slides here; previous summaries for weeks one, two, and three). This week focuses on market intelligence: the process of turning unstructured information into actionable investment decisions.

AI and LLMs are disrupting this sector by processing text at a scale and speed which fundamentally shifts the core economics of business analysis. Previously, this was a labor-intensive process bottlenecked by the speed of human reading capacity. Now, some of the core analytic functions have become commodified due to the rapid pace of AI advances. At the same time, faster and cheaper information doesn’t always help people make better investment decisions if the bottleneck shifts elsewhere. AI also enables completely new forms of intelligence functions: in particular in silico agent simulation. But are these information tools accurate?

So the key questions this week are: what is going on with the quality of information we summarize or simulate, and does it help us make better actions? And even bigger picture: where does the alpha go if everyone has access to AI tools?

The Arms Race in Textual Analysis
The history of text analysis in finance is a good illustration of the “bitter lesson” of scale economies combined with the “follow the price” principle from Session 1. Each generation of tool analysis commodifies one layer of analysis, pushing the alpha or edge further up the complexity stack.

The first generation was simple dictionary-based sentiment analysis. Tetlock’s classic 2007 paper counted words in one WSJ column using the Harvard psychosocial dictionary, estimated a simple pessimism factor, and showed it predicted Dow Jones returns. This was a big advance at the time, even though it built on a pretty simple measure. As we discussed back in Session 1, further advances from here developed finance-specific dictionaries (Loughran and McDonald) and chained together word combinations in n-grams and bag of words.

Then we get to LLMs. Lopez-Lira and Tang showed that GPT-4 can classify news headlines for stock market impact with pretty high accuracy (capturing 90% of the hit rate for initial reaction). The really interesting result though was that the Sharpe ratio of the LLM classification trading strategy was steadily decreasing over time alongside rising LLM adoption. The information edge from reading headlines was apparently real, but got competed away and is now largely priced in....

....MUCH MORE