The process of scientific discovery, even within finance, is essential. One approach to finding new strategies could be to generate observations and then provide explanations, such as an inductive approach. The other is to first form a theory and then test a hypothesis, deductive reasoning. Much of machine and statistical learning is inductive reasoning, where data are used to suggest general hypotheses. Deductive reasoning is used with expert systems, where rules are made and then tested against the data.

The surge in data analysis is based on a belief that inductive analysis will be able to identify new relationships that may not have been previously hypothesized. The danger comes when the analysis is atheoretical. This has been given a name, HARKing (hypothesizing after the results are known).

Data may also be mined for anomalies or risk premiums that may not exist. Data can be tortured until it generates some significance, “p-hacking.” A relationship is found, but only after the fact is explained. From data come stories, not the testing of ideas. Call it meaning without structure. Now, we don’t want to put all of these techniques on a trash heap to be ignored, but there is important room for experts and practitioners to guide and interpret what the data mines are producing.

If data suggests a relationship but is not a straightforward story to tell, the relationship should be suspect. Good modeling and data analysis test hypotheses and stories and help understand what could be found in the data. Can there be new surprises in the data? Of course, but those should be the exception no the rule with machine learning.