Correlation and Causation

Correlation is an important statistical concept that refers to the relationship between two variables. A variable is some number that changes over time or between samples, such as population, national income, annual rainfall, number of genetic mutations in a cell, stock prices, voter turnout, etc. When we measure two variables over some span of time, we may be interested to know if there is any tendency for those two variables to move together, so that when one goes up the other is also likely to increase, and likewise when the other goes down the other is also likely to go down. The measurement of this ‘degree of moving together’ is called the correlation between those two variables.

Correlation is a number that is always between -1 and +1. If two variables have a positive correlation (between 0 and +1), it means that they tend to move in the same direction, so that when one goes up the other tends to go up, and when one goes down the other tends to go down. For example, income and education are positively correlated. If the two variables have a negative correlation (between 0 and -1), it means they tend to move in opposite directions, so that when one goes up the other tends to go down. For example, average number of cigarettes smoked per day and life expectancy have a negative correlation. A correlation of exactly zero is also called ‘no correlation’, and essentially means that there is no relationship between the two variables (more strictly speaking it means there is no linear relationship between the variables). For example, there is no correlation between gender and IQ. The closer a correlation is to 1 (negative or positive), the stronger the correlation is said to be. Strongly correlated variables move very closely together, while weakly correlated variables only follow each other a limited amount.

It is important to understand that correlation is not the same as causation: just because two variables are correlated with each other does not mean that those variables are causally related to each other. Ice cream sales and violent crime, for example, are positively correlated, even though they bear no direct causal relationship (the likely reason they are correlated is because hot weather tends to increase ice cream sales and also is associated with more violent crime).

Further Reading

Spurious correlations: a collection of many spurious (non-causal) correlations, great for illustrations

Correlation or causation: a resource of news articles and headlines which conflate correlation and causation, with commentary and discussion

When does correlation imply causation: very helpful discussion of this question on StackExchange

What everyone should know about statistical correlation: useful popular article by American Scientist magazine