📈 Linear Regression Calculator – OLS Fit, R², and Prediction
Linear regression is one of the most widely used statistical techniques in science, economics, engineering, and data analysis. It models the straight-line relationship between an independent variable X and a dependent variable Y, enabling you to quantify trends, test hypotheses, and predict future values. This calculator performs ordinary least squares (OLS) regression instantly — no spreadsheet or statistics software required.
What Is Linear Regression?
Given a set of paired observations (x₁, y₁), (x₂, y₂), …, (xₙ, yₙ), linear regression finds the unique straight line ŷ = a + bx that minimises the sum of squared vertical distances (residuals) between the observed Y values and the fitted line. This least-squares criterion guarantees the best unbiased linear estimate of the slope and intercept.
Core Formulas
The calculator uses these exact analytical formulas:
| Parameter | Formula |
|---|---|
| Slope (b) | b = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²) |
| Intercept (a) | a = ȳ − b·x̄ |
| Regression equation | ŷ = a + bx |
| Pearson r | r = (nΣxy − ΣxΣy) / √[(nΣx² − (Σx)²)(nΣy² − (Σy)²)] |
| Coefficient of determination | R² = r² |
| Residual | eᵢ = yᵢ − ŷᵢ |
| Residual sum of squares (SSE) | SSE = Σeᵢ² |
| Residual standard error | s = √(SSE / (n − 2)) |
How to Use This Calculator
- Enter your data. In Paired Lists mode, type X values in the first box and matching Y values in the second — comma or newline-separated. Switch to CSV / Table mode to paste two-column spreadsheet data directly.
- Set options. Label your axes, choose a confidence level (90%, 95%, or 99%), set decimal precision, and enable or disable the through-origin constraint and residual table as needed.
- Add a prediction X (optional). Enter any X value to get the corresponding predicted Y, plus confidence and prediction intervals.
- Click Calculate. Results appear instantly: the regression equation, slope, intercept, Pearson r, R², residual standard error, an interactive scatter plot with the fitted line, and a full residual table.
Understanding the Key Outputs
Slope (b) and Intercept (a)
The slope tells you how much Y increases (or decreases) for each one-unit increase in X. For example, a slope of 2.5 means Y rises by 2.5 units when X increases by 1. The intercept is the predicted Y value when X = 0; it positions the line vertically on the chart.
Pearson r and R²
The Pearson correlation coefficient r ranges from −1 to +1 and measures the strength and direction of the linear association. The coefficient of determination R² is r squared and represents the proportion of variance in Y explained by X. An R² of 0.92 means 92% of the variability in Y is accounted for by the regression on X.
Confidence Intervals vs. Prediction Intervals
When you enter a Prediction X, the calculator returns two intervals:
- Confidence interval (CI): The range in which the mean Y response for that X is expected to fall. It is narrower and describes the precision of the fitted line at that point.
- Prediction interval (PI): The range in which a single future observation of Y is expected to fall. It is always wider than the CI because it accounts for both estimation uncertainty and individual variation around the line.
Residual Analysis
The residual for observation i is eᵢ = yᵢ − ŷᵢ — the vertical distance between the observed and fitted value. Examining residuals helps you judge model quality:
- Residuals scattered randomly around zero suggest the linear model is appropriate.
- A U-shaped or funnel-shaped residual pattern indicates non-linearity or heteroscedasticity, which may require a transformation or a different model.
- Points with unusually large residuals (flagged as outlier? in the table) may warrant further investigation.
The residual standard error s = √(SSE / (n − 2)) is the typical prediction error around the fitted line, measured in the same units as Y.
Through-Origin Regression
Enabling Force intercept = 0 constrains the line to pass through the origin (0, 0). Use this only when the problem context genuinely requires Y = 0 when X = 0 — for instance, displacement versus time from rest, or cost as a pure function of volume with no fixed overhead. In most other situations, the unconstrained fit is more accurate and should be preferred. Note that R² from through-origin regression is not comparable to R² from the standard model.
Input Data Tips
- Numbers can be separated by commas, spaces, tabs, or line breaks.
- X and Y lists must contain the same count of values, matched positionally.
- At least 2 data points are needed to fit a line; 3 or more are recommended for confidence intervals and meaningful diagnostics.
- All X values must not be identical — a vertical dataset has no defined slope.
- In CSV mode, each row must contain exactly one X value and one Y value; extra columns are ignored.
Common Applications
Linear regression is used across virtually every quantitative field:
- Education: Predicting exam scores from study hours.
- Economics & Finance: Modelling sales growth versus advertising spend, or asset returns versus market returns.
- Science & Engineering: Calibration curves, Hooke's Law verification, speed vs. distance relationships.
- Health: Estimating blood pressure trends with age, dosage–response studies.
- Machine Learning: Linear regression is the foundational supervised learning algorithm; understanding OLS is essential before advancing to regularised or non-linear models.