THE AVERAGE IS BETTER THAN AVERAGE

Abstract

With the prolif­eration of data and exponential growth in computing power, researchers are virtually guaranteed to find something that performs very well despite not having any useful predictive ability. It merely fits idiosyncrasies, or noise, in the underlying dataset. In machine learning, this is called overfitting. Indeed, overfitting is now believed to be responsible for the failure of discoveries made in empirical finance to deliver in practice. Since expected returns and volatilities are unknown quantities and must be estimated from historical data, differences in performance between models can merely be the result of estimation error. Therefore, it is advantageous to combine the models. In machine learning, this is called an ensemble. Ensembles pool the predictions from many different models. If the models are imperfectly correlated, the combination can result in superior predictive power.