Logo

MonoCalc

/

Linear Regression Calculator

Math

Options

Enter an X value to estimate the corresponding Y from the fitted line.

About This Tool

📈 Linear Regression Calculator – OLS Fit, R², and Prediction

Linear regression is one of the most widely used statistical techniques in science, economics, engineering, and data analysis. It models the straight-line relationship between an independent variable X and a dependent variable Y, enabling you to quantify trends, test hypotheses, and predict future values. This calculator performs ordinary least squares (OLS) regression instantly — no spreadsheet or statistics software required.

What Is Linear Regression?

Given a set of paired observations (x₁, y₁), (x₂, y₂), …, (xₙ, yₙ), linear regression finds the unique straight line ŷ = a + bx that minimises the sum of squared vertical distances (residuals) between the observed Y values and the fitted line. This least-squares criterion guarantees the best unbiased linear estimate of the slope and intercept.

Core Formulas

The calculator uses these exact analytical formulas:

ParameterFormula
Slope (b)b = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²)
Intercept (a)a = ȳ − b·x̄
Regression equationŷ = a + bx
Pearson rr = (nΣxy − ΣxΣy) / √[(nΣx² − (Σx)²)(nΣy² − (Σy)²)]
Coefficient of determinationR² = r²
Residualeᵢ = yᵢ − ŷᵢ
Residual sum of squares (SSE)SSE = Σeᵢ²
Residual standard errors = √(SSE / (n − 2))

How to Use This Calculator

  1. Enter your data. In Paired Lists mode, type X values in the first box and matching Y values in the second — comma or newline-separated. Switch to CSV / Table mode to paste two-column spreadsheet data directly.
  2. Set options. Label your axes, choose a confidence level (90%, 95%, or 99%), set decimal precision, and enable or disable the through-origin constraint and residual table as needed.
  3. Add a prediction X (optional). Enter any X value to get the corresponding predicted Y, plus confidence and prediction intervals.
  4. Click Calculate. Results appear instantly: the regression equation, slope, intercept, Pearson r, R², residual standard error, an interactive scatter plot with the fitted line, and a full residual table.

Understanding the Key Outputs

Slope (b) and Intercept (a)

The slope tells you how much Y increases (or decreases) for each one-unit increase in X. For example, a slope of 2.5 means Y rises by 2.5 units when X increases by 1. The intercept is the predicted Y value when X = 0; it positions the line vertically on the chart.

Pearson r and R²

The Pearson correlation coefficient r ranges from −1 to +1 and measures the strength and direction of the linear association. The coefficient of determination R² is r squared and represents the proportion of variance in Y explained by X. An R² of 0.92 means 92% of the variability in Y is accounted for by the regression on X.

Confidence Intervals vs. Prediction Intervals

When you enter a Prediction X, the calculator returns two intervals:

  • Confidence interval (CI): The range in which the mean Y response for that X is expected to fall. It is narrower and describes the precision of the fitted line at that point.
  • Prediction interval (PI): The range in which a single future observation of Y is expected to fall. It is always wider than the CI because it accounts for both estimation uncertainty and individual variation around the line.

Residual Analysis

The residual for observation i is eᵢ = yᵢ − ŷᵢ — the vertical distance between the observed and fitted value. Examining residuals helps you judge model quality:

  • Residuals scattered randomly around zero suggest the linear model is appropriate.
  • A U-shaped or funnel-shaped residual pattern indicates non-linearity or heteroscedasticity, which may require a transformation or a different model.
  • Points with unusually large residuals (flagged as outlier? in the table) may warrant further investigation.

The residual standard error s = √(SSE / (n − 2)) is the typical prediction error around the fitted line, measured in the same units as Y.

Through-Origin Regression

Enabling Force intercept = 0 constrains the line to pass through the origin (0, 0). Use this only when the problem context genuinely requires Y = 0 when X = 0 — for instance, displacement versus time from rest, or cost as a pure function of volume with no fixed overhead. In most other situations, the unconstrained fit is more accurate and should be preferred. Note that R² from through-origin regression is not comparable to R² from the standard model.

Input Data Tips

  • Numbers can be separated by commas, spaces, tabs, or line breaks.
  • X and Y lists must contain the same count of values, matched positionally.
  • At least 2 data points are needed to fit a line; 3 or more are recommended for confidence intervals and meaningful diagnostics.
  • All X values must not be identical — a vertical dataset has no defined slope.
  • In CSV mode, each row must contain exactly one X value and one Y value; extra columns are ignored.

Common Applications

Linear regression is used across virtually every quantitative field:

  • Education: Predicting exam scores from study hours.
  • Economics & Finance: Modelling sales growth versus advertising spend, or asset returns versus market returns.
  • Science & Engineering: Calibration curves, Hooke's Law verification, speed vs. distance relationships.
  • Health: Estimating blood pressure trends with age, dosage–response studies.
  • Machine Learning: Linear regression is the foundational supervised learning algorithm; understanding OLS is essential before advancing to regularised or non-linear models.

Frequently Asked Questions

Is the Linear Regression Calculator free?

Yes, Linear Regression Calculator is totally free :)

Can I use the Linear Regression Calculator offline?

Yes, you can install the webapp as PWA.

Is it safe to use Linear Regression Calculator?

Yes, any data related to Linear Regression Calculator only stored in your browser (if storage required). You can simply clear browser cache to clear all the stored data. We do not store any data on server.

What is linear regression and how does this calculator work?

Linear regression finds the best-fit straight line through a set of paired (X, Y) data points using ordinary least squares (OLS). Enter your X values and Y values (comma- or newline-separated), click Calculate, and the tool instantly computes the slope, intercept, Pearson r, R², residual standard error, and an optional predicted Y for any X you specify.

What does R² (coefficient of determination) mean?

R² measures the proportion of variance in Y that is explained by the linear relationship with X. An R² of 1.00 means the line perfectly predicts every Y value; 0.00 means the line explains none of the variation. For example, R² = 0.85 means 85% of the variability in Y is accounted for by the regression on X.

What is the difference between a confidence interval and a prediction interval?

A confidence interval (CI) for the mean response estimates the range in which the true average Y falls for a given X. A prediction interval (PI) is wider because it also accounts for the scatter of individual observations around the regression line. Use the CI when you want to estimate the mean; use the PI when you want to predict a single future observation.

When should I use 'Force intercept through origin'?

Enable through-origin regression only when the physical or economic relationship strictly requires Y = 0 when X = 0 — for example, distance traveled at zero speed, or cost with zero units produced. In most real-world applications, a non-zero intercept is natural and should be retained.

How do I enter data in CSV / table mode?

In CSV mode, paste or type two-column data with one pair per row. The two values in each row can be separated by a comma, tab, or space — for example: '1, 2.1'. Header rows are not supported; start directly with numeric data. The paired-lists mode accepts comma- or newline-separated numbers in two separate text boxes.

How accurate are the regression results?

All calculations use standard double-precision floating-point arithmetic with deterministic OLS formulas, producing results identical to those of spreadsheet tools and statistical software for the same data. Confidence and prediction intervals use the exact Student t-distribution. Results are exact for any finite dataset that does not cause floating-point overflow.