Torturing Historical Market Data

Here are some other random thoughts on using historical market data when making investment decisions:

There’s no such thing as right or wrong data, just better or worse. Stock market data looks spotless when you just see the performance numbers, but looks can be deceiving. There’s no way one can expect historical data to be all encompassing going back 100-200 years when it comes to earnings numbers, stock-splits, dividends paid or even the inclusion of all companies in an index (such as small caps or failed companies).

Most investors run back-tests under the assumption that the information would have been available to act on at the time. That’s simply not the case. Don’t you think investors would have made different decisions 50 or 60 years ago if they had the amount of data and computing power we have today? Saying you could have killed the market based on your back-test doesn’t hold up if you’re not using only data that existed at that time. The old saying is that if you torture the data long enough, it will admit to anything seems to apply here.

Data is useless without common sense. Some of my favorite quants that I read on a regular basis are Jim O’Shaughnessy, Wes Gray, Meb Faber and Patrick O’Shaughnessy. While I appreciate their quantitative research, what really stands out to me about each of their approaches is the fact that they acknowledge the behavioral side of the equation, as well. Numbers don’t mean anything if you can’t recognize their possible limitations or the fact that investors don’t always utilize them correctly.

Yeah but… Market data always comes with caveats. Show me a return series or valuation metric based on past data and I’ll find you someone that can play devil’s advocate: This data is skewed by the tech bubble. What if you take out the financial crisis? Yeah, but what about the 1970s? What if you lengthen the time horizon? What about Japan? 

If we can’t agree on the past how are we ever going to agree about the future? Sometimes I wonder why there are so many intelligent investors out there that fail. The easiest explanation is that people can’t agree on anything. If investors can’t even agree on how to interpret historical data, there’s no way we’re all going to agree about what’s going to happen in the future. Context often determines investor attitudes and opinions. It’s why some succeed while others fail.

It’s never going to be easy. Even if we had 5,000 years of data it still wouldn’t be a large enough sample size because investors would continue to make their own biased judgments of what that past data means to them. Every cycle is unique, so using historical market data works better as a way of defining risks, not as a way of knowing exactly what’s going to happen next.

Having a firm grasp on financial history is essential for investors, but the interpretation of that data is where things always get tricky.

Further Reading:
On the merits of being a financial historian



This content, which contains security-related opinions and/or information, is provided for informational purposes only and should not be relied upon in any manner as professional advice, or an endorsement of any practices, products or services. There can be no guarantees or assurances that the views expressed here will be applicable for any particular facts or circumstances, and should not be relied upon in any manner. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment.

The commentary in this “post” (including any related blog, podcasts, videos, and social media) reflects the personal opinions, viewpoints, and analyses of the Ritholtz Wealth Management employees providing such comments, and should not be regarded the views of Ritholtz Wealth Management LLC. or its respective affiliates or as a description of advisory services provided by Ritholtz Wealth Management or performance returns of any Ritholtz Wealth Management Investments client.

References to any securities or digital assets, or performance data, are for illustrative purposes only and do not constitute an investment recommendation or offer to provide investment advisory services. Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others.

The Compound Media, Inc., an affiliate of Ritholtz Wealth Management, receives payment from various entities for advertisements in affiliated podcasts, blogs and emails. Inclusion of such advertisements does not constitute or imply endorsement, sponsorship or recommendation thereof, or any affiliation therewith, by the Content Creator or by Ritholtz Wealth Management or any of its employees. Investments in securities involve the risk of loss. For additional advertisement disclaimers see here:

Please see disclosures here.

What's been said:

Discussions found on the web
  1. Ryan Turner commented on Nov 18

    I think the point was that by using overlapping observations you invalidate any tests of statistical significance, making it impossible to know if the relationships you discover are the result of pure chance or represent a meaningful edge.

    I personally prefer looking at p-values when evaluating an edge since just taking the mean can be misleading given the level of variance in market returns and the relatively small amount of data we have to work with.

    Of course, getting a meaningful p-value when your data is heteroscedastic, autocorrelated, and highly colinear is a problem in itself; and then you have to worry about whether a relationship – no matter how significant – will continue to persist into the future.

    At some point you have to make a leap of faith, but a table of historical means is relatively premature from a statistical standpoint.

    • Ben commented on Nov 18

      Right. I understand the difference between dependent and independent data, but in real life, normal people make continuous contributions over time. In that sense those overlapping time frames do make sense in the real world, even if they don’t in a statistical sense.

      And I agree with you that eventually you have to make a decision. The best you can do is try to improve your probability for success. There will always be a wide range of outcomes.

  2. 10 Thursday AM Reads | The Big Picture commented on Nov 20

    […] with a precipitous decline in equity prices (Marketwatch) • Torturing Historical Market Data (A Wealth of Common Sense) • It’s Investor Behavior, Not Investment Behavior That Matters (Irrelevant Investor) see also […]

  3. Patrick Haughey commented on Nov 20

    Excellent article. I think the reason is that few investors or managers of money (or politicians) have any idea about history. When was the last time they took a history class?

    • Ben commented on Nov 20

      There was an article on Valuewalk yesterday where portfolio mng Bob Rodriguez shared a story about listening to a Charlie Munger speech in the 1970s. He asked Munger how to get ahead during the Q&A. Munger replied, “Read history. Read history. Read history.”

  4. Bob Carlson » Interpreting the Data commented on Nov 21

    […] different people. And the data often doesn’t tell a compelling story everyone can agree on. Here’s a good post listing guidelines for using and interpreting […]

  5. What I am reading: April 1, 2015 » Alvarez Quant Trading commented on Apr 01

    […] Torturing Historical Market Data – The first sentence is so true. “There’s no such thing as right or wrong data, just better or worse. Stock market data looks spotless when you just see the performance numbers, but looks can be deceiving.“ […]