Circular Analysis

Circular analysis refers to any research methodology in which the type of analysis conducted is determined by the data that is obtained, rather than the other way around. Ideally, research should begin with some hypotheses to examine and an established method for how the data will be analysed once it is collected. If this practise is not followed, and instead the type of analysis or even the question to be asked is determined after examining the data, any resulting conclusions are suspect. The reason for this is that almost any set of data will exhibit some sort of unusual or potentially interesting properties, since there are so many different ways in which a given set of numbers could be ‘unusual’ or ‘interesting’. Many of these, however, will occur due to chance alone, and not be reflective of any genuine underlying phenomena. The range of options available to researchers in interpreting data and choosing what sort of analysis to perform is known as the researcher degrees of freedom. The more degrees of freedom there are, the more likely it is that any results are simply due to picking the ‘right’ method of analysis that yields a particular conclusion, rather than the conclusion being driven by the data itself. Only if a particular method was decided upon before the data was actually examined can it be concluded with confidence that the result is likely due to a real effect, rather than just the chance finding of ‘something’ that looked unusual in the data.

As an example of how circular analysis can take place in practise, consider a new drug that is being tested to see if it yields any health benefits. The correct way of carrying out such a test would be to specify what particular health benefits the drug is expected to bring about, and then compare those who received the drug to those who received a placebo, to determine whether the effect was indeed present in those who took the drug and absent in those who took the placebo. An incorrect circular analysis, by contrast, would not specify in advance any particular therapeutic benefit, but would instead collect a wide range of health outcome from patients, and then search through them to find any which were better in the treatment compared to the control group. This is an invalid method of analysis because by chance alone there will nearly always be some outcomes that are found to be better in one group than another if a large enough number of possibilities are considered. Only by specifying in advance what the effect should be can we be confident that the effect is real, and not really a result of chance.

Further Reading

Circular analysis in systems neuroscience: an excellent article discussing the problem in a neuroscience context

The danger of overfitting regression models: a short introduction to the overfitting of statistical models, which is a closely related phenomenon

Misleading modelling – overfitting, cross-validation, and the bias-variance tradeoff: a more advanced discussion with a machine-learning focus