From Columbia Law School's CLS Blue Sky blog:
The annual report, like other regulatory filings, is more than a legal requirement; it provides an opportunity for public companies to communicate their financial health, promote their culture and brand, and engage with a full spectrum of stakeholders. How readers process all this information affects their perception of, and hence participation in, the business in significant ways. More and more companies are realizing that the target audience for disclosures is no longer just human analysts and investors, but also robots and algorithms that recommend what shares to buy and sell after processing information with machine learning tools and natural language processing kits.
This development was probably inevitable, given technological progress and the sheer volume of disclosure materials. In any event, companies that wish to communicate and engage with stakeholders need to adjust how they talk about their finances and brands and make forecasts in the age of AI. That means heeding the logic and techniques underlying the language- and sentiment-analysis facilitated by large-scale machine-learning computation. An example of that sort of computation is a process that identifies positive, negative, and neutral opinions in, say, all disclosures by a company, a task that is beyond the processing ability of human brains. While the literature is catching up to and guiding investors’ use of machine learning and computational tools to extract qualitative information from disclosure and news, there has been no analysis of the feedback effect: how companies adjust the way they talk while knowing that machines are listening. Our new paper fills this void.
We start with a diagnostic test that connects the expected extent of AI readership for a company’s SEC filings on EDGAR (measured by Machine Downloads) with how machine-friendly its disclosure is (measured by Machine Readability). The first variable, Machine Downloads, is constructed with historical information by tracking IP addresses that conduct downloads in batches. We deem Machine Downloads a proxy for AI readership, both because a request by a machine request is a necessary condition for machine reading, and because the sheer volume of machine downloads makes it unlikely that human readers alone can process them. The second variable builds on the five elements identified by recent literature as affecting the ease with which a machine can parse, script, and synthesize.
We show that, in the cross-section of filings, a one standard deviation change in expected machine downloads is associated with 0.24 standard deviation increase in the Machine Readability of the filing. On the other hand, other (non-machine) downloads do not bear any meaningful correlation with machine readability, validating Machine Downloads as a proxy for machine readership. We further validate that Machine Downloads and Machine Readability are reasonable proxies (for the presence of machine readership and the ease for machines to process) by showing that trades in a company’s shares happen more quickly after a filing becomes public when Machine Downloads is higher, with even stronger interactive effect with Machine Readability. Such a result also demonstrates the real impact of machine-process on information dissemination.
After establishing a positive association between a high AI reader base and more machine-friendly disclosure documents, we further explore how firms manage “sentiment” and “tone” perceived by machines. It is well-documented that corporate disclosures attempt to strike the right tone with (human) readers by conveying positive sentiments and favorable tones without being explicitly dishonest or noncompliant. Hence, we expect a similar strategy tailored to machine readers. While researchers and practitioners had long relied on the Harvard Psychosociological Dictionary to construct “sentiment” as perceived by (mostly human) readers by counting and contrasting “positive” and “negative” words, the publication of Loughran and McDonald in the Journal of Finance in 2011, (“LM” hereafter) presents an instrumental event to test our hypothesis pertaining to machine readers. This is because not only Loughran and McDonald (2011) presented a new, specialized finance dictionary of positive/negative words and words that are informative about liability and uncertainty, but also the word lists that came with the paper has served as a leading lexicon for algorithms to sort out sentiments in both the industry and academia.
As a first step, we establish that firms which expect many machine downloads avoid LM-negative words but only after 2011 (the year of publication of the LM dictionary). Such a structural change is absent with respect to words deemed negative by the Harvard Dictionary, which was known to human readers for many years. As a result, the difference, LM – Harvard Sentiment, follows the same path as the LM Sentiment, suggesting that the change in disclosure style is indeed driven by the publication of the LM dictionary....
....MORE