Skip to content

08 Regression


used to predict for the dependent variable on the basis of past information available on dependent and independent variables.

The estimated regression line is given by

\[ \begin{aligned} \hat y &= b_0 + b_1 x \\ b_1 &= \frac{ n \ \sum (xy) - \sum x \sum y }{ n \ \sum x^2 - \Big( \sum x \Big)^2 } \\ b_0 &= \bar y - b_1 \bar x \\ \bar x &= \frac{\sum x} n \\ \bar y &= \frac{\sum y} n \end{aligned} \]
Term Meaning
\(y\) dependent variable
\(x\) independent variable
\(b_0\) y-intercept
\(b_1\) slope
\(\hat y\) estimated value
\(\bar x\) mean of \(x\)
\(\bar y\) mean of \(y\)


gives the degree of linear relationship between the 2 variables \(x\) and \(y\) \(-1 \le r \le +1\)

\[ r = \frac{ n \sum(xy) - \sum x \sum y }{ \sqrt{ n \sum (x^2) - \big(\sum x \big)^2 } \sqrt{ n \sum (y^2) - \big(\sum y \big)^2 } } \]
Type Correlation
Strength Weak \(\vert r \vert \le 0.5\)
Moderate \(0.5 < \vert r \vert < 0.8\)
Strong \(\vert r \vert \ge 0.8\)
Direction Directly \(r > 0\)
Inversely \(r < 0\)

Coefficient of Determination

\(R^2\) value is used for non-linear regression. It shows how well data fits within the regression.

It has a range of \([0, 1]\). Higher the better.

\[ \begin{aligned} R^2 &= 1 - \frac{ \text{SS}_{res} }{ \text{SS}_{tot} } \\ \text{SS}_\text{res} &= \sum\limits_{i=1}^n (y_i - \hat y)^2 \\ \text{SS}_\text{tot} &= \sum\limits_{i=1}^n (y_i - \bar y)^2 \\ \bar y &= \frac{1}{n} \sum\limits_{i=1}^n y_i \end{aligned} \]


Symbol Meaning
\(\text{SS}_\text{res}\) Residual sum of squares
\(\text{SS}_\text{tot}\) Total sum of squares
Proportional to variance of the data
Last Updated: 2024-01-24 ; Contributors: AhmedThahir
