Simple linear regression — get the equation, R², slope, intercept, and a chart with the regression line.
Linear regression finds the best-fitting straight line through a set of data points. The line is described by the equation: y = slope × x + intercept.
Unlike correlation (which only measures the relationship), regression produces a predictive model: you can plug in a value of x to predict the expected value of y.
Slope: For every 1-unit increase in x, y changes by [slope] units. A slope of 2.5 means y increases by 2.5 for each unit increase in x.
Intercept: The predicted value of y when x = 0. This is only meaningful if x = 0 is a plausible value in your data (e.g. not if x is "age in years" and 0 is outside the measured range).
R² (coefficient of determination) measures how well the regression line fits the data — what proportion of the total variability in y is explained by x. Ranges from 0 to 1.
R² = 0.80 means your regression line accounts for 80% of the variance in y. The remaining 20% is explained by other factors not in your model.
A high R² doesn't guarantee a useful model — always inspect the residual plot and check for patterns.
Check these with residual plots and normality tests. Violations may require data transformations or different models.
Correlation measures the strength of the relationship between two variables — it is symmetric (r between x and y equals r between y and x).
Regression is asymmetric — you have a predictor (x) and an outcome (y). It quantifies the relationship and produces a predictive equation. Use regression when you want to predict y from x; use correlation when you just want to quantify the association.