Linear regression is a fundamental statistical method in the field of statistics and machine learning used for predictive modeling. Its primary goal is to model the relationship between a dependent variable (often called the target or response variable) and one or more independent variables (or features).
Linear regression has its roots in the early 19th century. It was first introduced by Carl Friedrich Gauss in 1809, who developed the method of least squares, which is now a standard approach to estimate the parameters of a linear regression model. Later, Adolphe Quetelet and Francis Galton further developed these concepts, with Galton introducing the term "regression" after observing that the heights of children tended to "regress" towards the average height of their parents, rather than matching it exactly.
The simplest form of linear regression is simple linear regression, which involves two variables, one predictor (X) and one response (Y), modeled by the equation:
Y = β0 + β1X + ε
When there are more than one predictor variables, the model becomes multiple linear regression, where:
Y = β0 + β1X1 + β2X2 + ... + βnXn + ε
Linear regression is widely used in various fields:
The most common method for estimating the parameters (β0, β1, ..., βn) is the method of least squares, where the goal is to minimize the sum of squared residuals:
minimize Σ(yi - ŷi)2