Features

The MSR Philosophy: The Scientific Method and Model Building

By Michael S. Rulle Jr.

There is only one history in financial markets. But there are almost an infinite number of time series one can analyze. Think of all the combinations of markets, units of time (for example, one second, one minute, one hour, etc.) and periods of time within which those units reside (for example, one day, one week, one month, etc). We have characterized this framework of viewing time series' as analysis of "the distribution of distributions" and are key components of our model building process.

In a randomized log normal world, such a framework for analysis would be redundant. By mathematical definition, one could not outperform the market’s risk adjusted return in the long run except by pure luck. The alpha of such models would be zero (worse, counting transaction costs). Model development would be as fruitful as attempting to make money flipping fair coins. Therefore, all developers of trading models explicitly or implicitly believe markets are not unpredictably random. This is an assumption which should cause some humility. The challenge for modelers in trying to discover patterns which repeat themselves is daunting.

No model building method can assure success. However, the lack of a proper scientific methodology will almost certainly guarantee failure. There are many hurdles model builders need to overcome. In MSR’s experience, the “data mining” bias is one of the most difficult problems to solve. At its most basic level, the data mining bias is a form of self-deception that “discovers” spurious correlations in historical simulations, which are fundamentally random in nature. This is the primary reason most models fail “out of sample” in real trading. As obvious as this may seem as a general statement, in practice the elimination of the data mining bias is a very complex and detailed process.

There are an unlimited number of ways to combine historical data into formulas and regressions that perfectly fit history but which lack any predictive value. The challenge for model builders is to distinguish between that which may be predictive and that which is not. Professor David Leinweber of Caltech created one of the best examples of data mining bias in a paper known by its famous satirical “butter in Bangladesh” method of predicting stock market prices. Leinweber demonstrated how easy it is to find a meaningless correlation if one scours enough data and uses enough polynomials.

Leinweber literally regressed thousands of data series from 140 countries against the price of the S&P 500 over a 10-year period. He “discovered” that butter production in Bangladesh “explained” 75% of the return in the stock market. When he combined butter in Bangladesh with US cheese production and the sheep population in both countries he created an almost perfect fit (an R-squared of .99). This may seem obviously absurd, but Leinweber’s point is that if instead of butter in Bangladesh one had a model predicting stock prices using GDP and interest rates with an R-squared of .70, it might not seem so ridiculous. But a data miner can create non-predictive meaningless models using “sensible” data just as easily as with “butter in Bangladesh”.

What does MSR do to try to avoid this pitfall? One cannot avoid using historical data to “mine” for statistically significant patterns, nor should one want to. We have only one history, as multifaceted as it is. It is also unlikely that one’s first attempt at a hypothesis will yield the results one desires. It is inevitable that one will use the same data multiple times in the search for a successful predictive hypothesis. In statistics this is called the multiple comparison problem. However, if one uses hypothesis testing and other techniques on models without taking into account the number of different variables or parameters that were tested, one is almost certain to fall victim to the dating mining bias. One has to account for the number of tests done on the data to arrive at meaningful statistical inferences. It is extremely difficult to build successful models without using methods which “discount” these effects. In doing so, one improves the odds that the output of one’s models will not be fallacious.

The above model building prescription is neither straightforward nor mechanical, and in practice it is very difficult. Judgment is always required at every step. “Researcher bias” (i.e., the tendency of researchers to interpret data, or make judgments, toward their desired conclusion) is a risk for MSR as it is with all financial model builders. However, we try and keep this risk at the forefront of our thinking and methodology in order to minimize its likelihood.

Read David Leinweber’s “Stupid Data Miner Tricks: Overfitting the S&P 500”

The MSR Directional Trading Program is speculative in nature and the risk of loss is substantial. The information contained herein is obtained from sources we believe to be reliable, but MSR Investments, LLC (“MSR”) does not guarantee its accuracy. Past performance is not necessarily indicative of future results. The information contained herein is not an offer to invest in any trading program – such an offer can only be made with the official disclosure document which includes the principal risk factors and costs of participating in the managed account program.

Risk Disclosures

PAST PERFORMANCE IS NOT NECESSARILY INDICATIVE OF FUTURE RESULTS. THE RISK OF LOSS IN TRADING COMMODITY FUTURES, OPTIONS, AND FOREIGN EXCHANGE ("FOREX") IS SUBSTANTIAL.

YOU SHOULD CAREFULLY CONSIDER WHETHER SUCH TRADING IS SUITABLE FOR YOU IN LIGHT OF YOUR FINANCIAL CONDITION. THE HIGH DEGREE OF LEVERAGE THAT IS OFTEN OBTAINABLE IN COMMODITY FUTURES, OPTIONS, AND FOREX TRADING CAN WORK AGAINST YOU AS WELL AS FOR YOU. THE USE OF LEVERAGE CAN LEAD TO LARGE LOSSES AS WELL AS GAINS. IN SOME CASES, MANAGED COMMODITY ACCOUNTS ARE SUBJECT TO SUBSTANTIAL CHARGES FOR MANAGEMENT AND ADVISORY FEES. IT MAY BE NECESSARY FOR THOSE ACCOUNTS THAT ARE SUBJECT TO THESE CHARGES TO MAKE SUBSTANTIAL TRADING PROFITS TO AVOID DEPLETION OR EXHAUSTION OF THEIR ASSETS. THE DISCLOSURE DOCUMENT CONTAINS A COMPLETE DESCRIPTION OF THE PRINCIPAL RISK FACTORS AND EACH FEE TO BE CHARGED TO YOUR ACCOUNT BY THE COMMODITY TRADING ADVISOR ("CTA"). THE REGULATIONS OF THE COMMODITY FUTURES TRADING COMMISSION ("CFTC") REQUIRE THAT PROSPECTIVE CLIENTS OF A CTA RECEIVE A DISCLOSURE DOCUMENT BEFORE THEY ENTER INTO AN AGREEMENT WHEREBY THE CTA WILL DIRECT OR GUIDE THE CLIENT'S COMMODITY INTEREST TRADING AND THAT FEES AND CERTAIN RISK FACTORS BE HIGHLIGHTED. IASG WILL PROVIDE YOU A COPY OF THE DISCLOSURE DOCUMENT AT NO COST. YOU SHOULD REVIEW THE CTA'S DISCLOSURE DOCUMENT AND STUDY IT CAREFULLY TO DETERMINE WHETHER SUCH TRADING IS APPROPRIATE FOR YOU IN LIGHT OF YOUR FINANCIAL CONDITION. THE CFTC HAS NOT PASSED UPON THE MERITS OF PARTICIPATING IN THE TRADING PROGRAMS DESCRIBED ON THIS WEBSITE NOR ON THE ADEQUACY OR ACCURACY OF THE CTA'S DISCLOSURE DOCUMENT. THE INFORMATION CONTAINED ON THIS WEBSITE HAS BEEN PREPARED BY IASG FROM SOURCES DEEMED RELIABLE, BUT IASG DOES NOT GUARANTEE THE ADEQUACY, ACCURACY OR COMPLETENESS OF ANY INFORMATION. NEITHER IASG NOR ANY OF ITS RESPECTIVE AFFILIATES, OFFICERS, DIRECTORS, AGENTS AND EMPLOYEES MAKE ANY WARRANTY, EXPRESS OR IMPLIED, OF ANY KIND WHATSOEVER, AND NONE OF THESE PARTIES SHALL BE LIABLE FOR ANY LOSSES, DAMAGES, OR COSTS, RELATING TO THE ADEQUACY, ACCURACY OR COMPLETENESS OF ANY INFORMATION ON THIS REPORT.