Does Stock-Market Data Really Go Back 200 Years?

Over the weekend I was putting together some info on expected future equity returns and pulled some back copies of The Financial Analysts Journal, of which Mr. Arnott was for a time, editor. A very sharp guy. From the Wall Street Journal:

As of June 30, U.S. stocks have underperformed long-term Treasury bonds for the past five, 10, 15, 20 and 25 years.

Still, brokers and financial planners keep reminding us, there's almost never been a 30-year period since 1802 when stocks have underperformed bonds.

These true believers rely on the gospel of "Stocks for the Long Run," the book by finance professor Jeremy Siegel of the Wharton School at the University of Pennsylvania that was first published in 1994.

Using data assembled by other scholars, Prof. Siegel extended the history of U.S. stock returns all the way back to 1802. He came to two conclusions that became articles of faith to millions of investors: Ever since Thomas Jefferson was in the White House, stocks have generated a "remarkably constant" average return of nearly 7% a year after inflation. (Adding inflation at 3% yields the commonly cited 10% annual stock return.) And, declared Prof. Siegel, "the risks of holding stocks decrease over time."

There is just one problem with tracing stock performance all the way back to 1802: It isn't really valid.

Prof. Siegel based his early numbers on data first gathered decades ago by two economists, Walter Buckingham Smith and Arthur Harrison Cole.

For the years 1802 through 1820, Profs. Smith and Cole collected prices on three dozen banking, insurance, transportation and other stocks -- but ended up including only seven, all banks, in their stock-market index. Through 1845, they tracked 19 insurance stocks, but rejected 95% of them, adding only one to their index. For 1834 onward, they added a maximum of 27 railroad stocks.

To be a good measure of stock returns, an index should be comprehensive (by including many stocks) and representative (by including the stocks commonly held by investors). The Smith and Cole indexes are neither, as the professors signaled in their 1935 book, "Fluctuations in American Business." They cherry-picked their indexes by throwing out any stock that didn't survive for the whole period, whose share prices were too hard to find or whose returns seemed "inflexible," "erratic," or "non-typical."

The database of early U.S. securities at has so far identified more than 1,000 stocks that were listed on 10 different exchanges -- including Charleston, S.C., New Orleans, and Norfolk, Va. -- between 1790 and 1860. Thus the indexes relied on by Prof. Siegel exclude 97% of all the stocks that existed in the earliest years of the U.S. market, and include only the bluest of the blue-chip survivors. Never mind all of the canals, wooden turnpikes, rubber-hat companies and the other doomed stocks that investors lost millions on -- and whose returns may never be reconstructed.

There is a second problem with Prof. Siegel's data.

In an article published in 1992, he estimated the average annual dividend yield from 1802-1870 at 5.0%. Two years later in his book, it had grown to 6.4% -- raising the average annual return in the early years from 5.7% to 7.0% after inflation....MORE

In a January comment at MarketBeat a blogger said:

I thought it was interesting that you pointed out Citi’s 10yr performance.

I’ve done some research showing that the 10 calendar years for the S&P 500 ending Dec 31, 2008 were the worst since 1831! So Citi isn’t alone…

My comment was:

What’s your data source, pre-1871?
I’ve got “COMMON-STOCK INDEXES 1871-1937″ open on the desk as I type and Mr. Cowles is quite explicit as to the reasons the Commission didn’t go further back than 1871. (pg. 4)
A big one is the paucity of publicly traded industrials.
During a mis-spent youth I read every line of the book.
My favorite tidbit is the listing, among the pre-1871 industrials, of New York Guano.
Some things never change.
