Advertisement
Advertisement

Guide: Correlation

Correlation measures the strength and direction of the linear relationship between two continuous variables. It ranges from −1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear relationship.

Example: height and weight have a strong positive correlation (~0.7). As one tends to increase, so does the other.

Pearson r measures linear correlation and assumes both variables are continuous and approximately normally distributed. It is the default choice.

Spearman ρ is rank-based — it measures monotonic (not necessarily linear) correlation. Use Spearman when:

  • Data is ordinal (e.g. ratings, rankings)
  • The relationship is non-linear but monotonic
  • There are significant outliers
  • Normality cannot be assumed

Two variables can be strongly correlated without one causing the other. Possible reasons for a spurious correlation include:

  • Confounding variable: A third variable causes both (e.g. ice cream sales and drowning rates are both caused by hot weather)
  • Coincidence: Two unrelated time series that happen to trend together
  • Reverse causation: Y causes X, not X causes Y

Establishing causation requires randomised experiments, not just correlation.

(the coefficient of determination) tells you the proportion of variance in Y that is explained by its linear relationship with X. An r² of 0.64 means that 64% of the variability in Y can be explained by X.

For example, if height and weight have r = 0.8, then r² = 0.64 — height explains 64% of the variation in weight. The remaining 36% is due to other factors.