Torturing Historical Market Data

Here are some other random thoughts on using historical market data when making investment decisions:

There’s no such thing as right or wrong data, just better or worse. Stock market data looks spotless when you just see the performance numbers, but looks can be deceiving. There’s no way one can expect historical data to be all encompassing going back 100-200 years when it comes to earnings numbers, stock-splits, dividends paid or even the inclusion of all companies in an index (such as small caps or failed companies).

Most investors run back-tests under the assumption that the information would have been available to act on at the time. That’s simply not the case. Don’t you think investors would have made different decisions 50 or 60 years ago if they had the amount of data and computing power we have today? Saying you could have killed the market based on your back-test doesn’t hold up if you’re not using only data that existed at that time. The old saying is that if you torture the data long enough, it will admit to anything seems to apply here.

Data is useless without common sense. Some of my favorite quants that I read on a regular basis are Jim O’Shaughnessy, Wes Gray, Meb Faber and Patrick O’Shaughnessy. While I appreciate their quantitative research, what really stands out to me about each of their approaches is the fact that they acknowledge the behavioral side of the equation, as well. Numbers don’t mean anything if you can’t recognize their possible limitations or the fact that investors don’t always utilize them correctly.

Yeah but… Market data always comes with caveats. Show me a return series or valuation metric based on past data and I’ll find you someone that can play devil’s advocate: This data is skewed by the tech bubble. What if you take out the financial crisis? Yeah, but what about the 1970s? What if you lengthen the time horizon? What about Japan? 

If we can’t agree on the past how are we ever going to agree about the future? Sometimes I wonder why there are so many intelligent investors out there that fail. The easiest explanation is that people can’t agree on anything. If investors can’t even agree on how to interpret historical data, there’s no way we’re all going to agree about what’s going to happen in the future. Context often determines investor attitudes and opinions. It’s why some succeed while others fail.

It’s never going to be easy. Even if we had 5,000 years of data it still wouldn’t be a large enough sample size because investors would continue to make their own biased judgments of what that past data means to them. Every cycle is unique, so using historical market data works better as a way of defining risks, not as a way of knowing exactly what’s going to happen next.

Having a firm grasp on financial history is essential for investors, but the interpretation of that data is where things always get tricky.

Further Reading:
On the merits of being a financial historian




What's been said:

Discussions found on the web
  1. Ryan Turner commented on Nov 18

    I think the point was that by using overlapping observations you invalidate any tests of statistical significance, making it impossible to know if the relationships you discover are the result of pure chance or represent a meaningful edge.

    I personally prefer looking at p-values when evaluating an edge since just taking the mean can be misleading given the level of variance in market returns and the relatively small amount of data we have to work with.

    Of course, getting a meaningful p-value when your data is heteroscedastic, autocorrelated, and highly colinear is a problem in itself; and then you have to worry about whether a relationship – no matter how significant – will continue to persist into the future.

    At some point you have to make a leap of faith, but a table of historical means is relatively premature from a statistical standpoint.

    • Ben commented on Nov 18

      Right. I understand the difference between dependent and independent data, but in real life, normal people make continuous contributions over time. In that sense those overlapping time frames do make sense in the real world, even if they don’t in a statistical sense.

      And I agree with you that eventually you have to make a decision. The best you can do is try to improve your probability for success. There will always be a wide range of outcomes.

  2. 10 Thursday AM Reads | The Big Picture commented on Nov 20

    […] with a precipitous decline in equity prices (Marketwatch) • Torturing Historical Market Data (A Wealth of Common Sense) • It’s Investor Behavior, Not Investment Behavior That Matters (Irrelevant Investor) see also […]

  3. Patrick Haughey commented on Nov 20

    Excellent article. I think the reason is that few investors or managers of money (or politicians) have any idea about history. When was the last time they took a history class?

    • Ben commented on Nov 20

      There was an article on Valuewalk yesterday where portfolio mng Bob Rodriguez shared a story about listening to a Charlie Munger speech in the 1970s. He asked Munger how to get ahead during the Q&A. Munger replied, “Read history. Read history. Read history.”

  4. Bob Carlson » Interpreting the Data commented on Nov 21

    […] different people. And the data often doesn’t tell a compelling story everyone can agree on. Here’s a good post listing guidelines for using and interpreting […]

  5. What I am reading: April 1, 2015 » Alvarez Quant Trading commented on Apr 01

    […] Torturing Historical Market Data – The first sentence is so true. “There’s no such thing as right or wrong data, just better or worse. Stock market data looks spotless when you just see the performance numbers, but looks can be deceiving.“ […]