Home | Hall of Fame | Data Library | Buy Books! | Software Tools | Principles | Feedback
What is a Backtest?
Realism,
Taxes, and Transaction Costs
The Time-Lag Problem
The Survivorship Problem.
The Pricing Problem
The Data Mining Problem
The MEGO Problem
Revised 4/20/99
There are two ways of dealing with the time-lag problem. First, you could make an assumption about when the data became available to investors, and then assume that the transactions took place after that date. That's what James O'Shaugnessy did in What Works on Wall Street. O'Shaugnessy used a time lag of 11 months or more, but he doesn't fully explain what he meant. Since he rebalanced his portfolios on December 31, presumably he completely disregarded quarterly and annual reports for periods ending after the preceding January 31 (11 months earlier). He asserted the long time lag was necessary to make sure all the data he was using was available to the public on his rebalancing date.
O'Shaugnessy claims the long lag was "conservative" but it may have introduced a distortion of its own. Between January 31 and December 31 the typical company issues at least two, and more likely three, quarterly reports. A company with a December 31 fiscal year-end will issue reports for the quarters ending March 31, June 30, and September 30. The vast majority of these reports come out less than two months after the end of the quarter; many are publicly reported within 3 or 4 weeks. These reports are presumably used by investors in deciding what a company is worth, and probably carry a lot more weight than a report that is two or three quarters old. The causal link between the company's 12/31/97 report and stock performance between 12 and 24 months later (i.e., 12/31/98-12/31/99) seems far more tenuous than the impact of, say, the 9/30/98 report on that performance.
Q-Investor, a backtesting program from Q-Analytics, allows the backtester to choose the lag factor. The default is seven weeks. In other words, the program assumes that annual reports for the period ending 12/31 are not available until about 2/12. That's not a bad assumption. Unfortunately, Q-Investor has only annual data for years older than 1996 -- no quarterly data. All that older data is fiscal year data. If you use a buy date of January 1 and a seven-week lag, Q-Investor will use information for periods ending earlier than the previous mid-November. For most companies (which end their fiscal year on 12/31), that means a simulated investment decision on 1/1/95 uses the information from the annual report for the year ending 12/31/93. Like O'Shaugnessy, Q-Investor users must contend with the possible distortions of very long lag times.
The second way to combat time lag is to use data as it was actually published on a particular date in the past, and assume the transactions occurred shortly after publication of that data. This is what Robert Sheard did in The Unemotional Investor. He simply went to the library and looked up old issues of the Value Line Investment Survey. Probably the best way of doing this kind of time-lagging is with historical Value Line datasets in electronic form. These sets include monthly data going back to the beginning of 1986. The exact date of the data varies from month to month, but normally it was first published some time during the first week of the month, or the last few days of the preceding month. Value Line generally compiles the data on Wednesday and publishes it on Friday. Weeks with holidays are the exception, with compilation and publication dates all over the lot.
Using data as it was actually published, such as Value Line data, eliminates survivorship bias. All the companies that later disappeared are still in the historical Value Line data.
It is not clear whether there is a systematic bias in favor of a particular time of day. It is hard to say whether a transaction at the open is more favorable than one at the close or at midday. So long as the backtest consistently uses the same kind of price for both purchasing and selling, any systematic bias should be eliminated. Anything you might gain by waiting to buy at the close is eliminated at the next rebalancing when you sell at the close. As a practical matter, closing prices are more likely to be available in a usable form. Thus, most backtests use closing prices.
Bid/ask spreads are another problem faced by investors. Quotes from online services during the trading day may be bid prices (the price at which someone has offered to buy the stock), ask prices (the price at which someone is willing to sell the stock) or last transaction prices (a price at which the stock actually exchanged hands). The difference between the bid and the ask is the spread. It would be unreasonable to assume that you could always buy at the bid or sell at the ask. Bid and ask prices are not meant to reflect actual transactions in the marketplace. They simply reflect the status of the auction.
Closing prices always reflect the last trade of the day, Moreover, some brokers permit their clients to submit orders with instructions that they be executed at the close. Thus, assuming that purchases and sales in a backtest take place at a closing price seems to be reasonable.
For calculating returns, always use split-adjusted prices. Otherwise, you are likely to understate your returns. On the other hand, if the screen uses price as a screening criteria (for instance, the Foolish Four sorts Dow stocks in ascending order of price) use the actual trading price when deciding which stocks to pick, and switch to split-adjusted prices for calculating the investment returns.
Splits can also cause other "crystal ball" problems. For instance, one of the earlier promising screens using Q-Investor involved looking for stocks with high volume in terms of the number of shares traded per week. Strangely, the screen did not seem to work as well when the dollar value of shares trading hands was used instead of volume. The author of the screen finally realized that Q-Investor uses split-adjusted volume as well as price. When a stock splits 2:1, the number of shares doubles, as does its volume in Q-Investor. This makes it seem like some stocks, which would split at some point in the future as a result of their great performance, had enormous volume. In real life (i.e., without the split adjustment) their volume was not that remarkable. The screen was a crystal ball. High volume stocks were "destined" to split in the future because they would go up. Be on the lookout for crystal balls in your screens.
Data mining is even more of a hazard as backtesters use new programming techniques and greater computer horsepower. "Genetic algorithms" sometimes spit out eye-popping numbers when using historical data. Q-Investor, gives you the option to "Q-Optimize" a screen with genetic algorithms. It remains to be seen whether these algorithms actually help predict anything. As O'Shaugnessy said in What Works on Wall Street, "Torture the data enough and it will confess to anything."
There are two primary ways to combat data mining. The most obvious is to view with suspicion results that contradict common sense or other well-designed studies. For instance, assume that a backtest determines that if you change an investment model to require a return on invested capital (ROIC) of less than 15%, total return for the strategy increases. In the real world, a low ROIC is normally not a desirable characteristic. Possibly the rest of the screen was particularly good at identifying candidates in capital-intensive industries; the low-ROIC filter simply excluded companies outside those industries. The screen might do even better if it simply focused on these industries rather than ROIC. Another possibility is that the original version of the strategy chose a few bad market performers that coincidentally had a high ROIC. The low ROIC requirement eliminates these "problem child" stocks, but who can say whether problem stocks in the future will also have low ROIC? Those possibilities (and others) should be investigated before getting too excited.
The second way to avoid data mining is to divide the available historical data into a "play" set and a "confirmation" set. Use the play set to experiment with, and then see if the confirmation set confirms your hypothesis. If you had all 160 months of data from Value Line at your disposal (from January 1986 to April 1999), you might have a play set of 30 randomly selected months. This method is almost sure to tell you whether you are really eliminating problem children or just trying too hard. If you do not have fairly large set of data to start with, you cannot use this technique. Be very cautious in interpreting your results.
Here are the questions most investors want a backtest to answer before putting money into an investment model:
Home | Hall of Fame | Data Library | Buy Books! | Software Tools | Principles | Feedback