Backtest overfitting in smart beta investments


In the past few years “smart beta” (also known as “alternative beta” or “strategic beta”) investments have grown rapidly in popularity. As of the current date (January 2017), assets in these investment categories have grown to over USD$500 billion, and are expected to reach USD$1 trillion by 2020. More than 844 exchange-traded funds employing a smart beta strategy are now in operation.

The basic idea behind smart beta is to observe that traditional capitalization-weighted investments (such as S&P 500 index funds) tend to be heavily weighted in favor of securities from large, stable firms. Thus the smart beta community believes that by adopting a different weighting system, such as one that magnifies smaller, higher-risk securities, one can achieve superior long-term growth. The smart beta movement had its roots in tests of the efficient market hypothesis in the academic finance community during the 1970s, 1980s and 1990s.

The general “smart beta” philosophy encompasses many specific strategies. If one invests equally, say, in a S&P500 index fund and a mid-cap or small-cap index fund, one is using a “smart beta” strategy, by this definition. One of the earliest commercial smart beta funds assigned all S&P 500 index components equal weights. Today, the smart beta world encompasses a diverse range of strategies, ranging from novel systems of weights to systematic trading algorithms, derivatives and multi-asset investments.

Even within the realm of weighting-based smart beta strategies, there are numerous variations, as given in the following taxonomy:

  • Return-oriented: Dividend-screened, dividend-weighted, value, growth, fundamentals, multi-factor, size, momentum, buyback/shareholder yield, earnings weighted, quality, expected returns or revenue weighted.
  • Risk-oriented: Minimum volatility/variance, low/high beta or risk-weighted.
  • Other: Non-traditional commodity, equal-weighted, non-traditional fixed income or multi-asset.

Many are concerned, however, that the original concept of “smart beta” has been extended to such a large collection of sophisticated strategies that the simplicity and elegance of the original concept has been lost. What’s more, as the complexity of these strategies has increased, so has the likelihood that they will suffer from (or be invalidated by) backtest overfitting.

By backtest overfitting here, we mean the usage of historical market data to develop an investment strategy, where many parameter variations (quite possibly millions or billions of variations) have been searched by computer to find an optimal strategy. The present authors, among others, have developed statistical tests to help one avoid backtest overfitting; see, for example, our paper on the deflated Sharpe ratio, which corrects for two leading sources of performance inflation, namely selection bias under multiple testing and non-normally distributed returns.

New analysis of smart beta strategies

A new paper on the topic of backtest overfitting in smart beta strategies has appeared in the Journal of Portfolio Management (see also preprint).

These authors (Antti Suhonen, Matthias Lennkh and Fabrice Perez) began with a database of approximately 3000 alternative beta strategies, from which they selected 215 unique strategies with sufficient data for their analysis. The average backtest period of their test set was 10.7 years, and the average live operation time was 4.6 years. These 215 strategies included commodity, equity, fixed income, foreign exchange and multi-asset schemes.

Among their tabulated statistics is the Sharpe ratio of the strategy during the backtest period, the Sharpe ratio during the live period and the “realized haircut” — the percentage reduction in Sharpe ratio between the backtest and the live periods. Over all tested strategies, they found a surprisingly large median haircut of 72%; for equity strategies it was 80%. It is also notable that only 18 of the 215 strategies had a live Sharpe ratio greater than the backtest Sharpe ratio; 65 of the 215 strategies had a negative Sharpe ratio over the live period.

Suhonen, Lennkh and Perez recognized that given the 2008-2009 worldwide market crash, their results might be dismissed as an artifact of that unfortunate episode. But when they restricted their data to just a 3-year pre/post time window, thus avoiding the 2008-2009 period, their results were little changed — the median haircut dropped from 72% to 62% (which is still a very large drop-off), but the same number of strategies as before (65) exhibited a negative Sharpe ratio over the live period.

In another interesting analysis, the Suhonen-Lennkh-Perez paper categorized the 215 strategies by complexity, and then rated their performance accordingly to this categorization. Indeed, as some have previously feared, they found that the more complex strategies suffered larger haircuts — in particular, the most complex strategies suffered 30 percentage points more haircut than the simplest strategies.


The findings on complexity are arguably the most significant of the Suhonen-Lennkh-Perez paper. What they found, in summary, is that while the “smart beta” approach may have some merit for very simple strategies, such as merely balancing a large-cap exchange-traded fund with a mid- or small-cap exchange-traded fund, more sophisticated strategies of this class (which are typically the result of large-scale computer-based searches for an optimal parameter selection) fall prey to backtest overfitting.

These overall results are entirely consistent with the results of a paper by the present authors, which has found that backtest overfitting is remarkably easy to occur in any investment strategy that was designed using computer-based searches over a large parameter space to find an “optimal” design (as is certainly done in many smart beta strategies).

These results are also consistent with an separate paper by the present authors on stock fund weighting schemes. We demonstrated that it is relatively easy to design a stock portfolio, consisting only of weighted S&P 500 index stocks, that will achieve virtually any desired performance profile, based on backtests. Such portfolios, however, typically do much worse on new out-of-sample data, a symptom of serious backtest overfitting.

Backtest overfitting is not a minor flaw. If an overfit strategy is implemented, it may well result in disappointing returns or even catastrophic loss of capital. Thus it behooves all who design such strategies to ensure that their design is not overfit.

Comments are closed.