Regression¶
used to predict for the dependent variable on the basis of past information available on dependent and independent variables.
The estimated regression line is given by
Term | Meaning |
---|---|
\(y\) | dependent variable |
\(x\) | independent variable |
\(b_0\) | y-intercept |
\(b_1\) | slope |
\(\hat y\) | estimated value |
\(\bar x\) | mean of \(x\) |
\(\bar y\) | mean of \(y\) |
Correlation¶
gives the degree of linear relationship between 2 vars
Properties
- Dimensionless
- Symmetric: \(r(x, y)=r(y, x)\)
- \(r \in [-1, +1]\)
Pearson’s Correlation¶
Also called product moment correlation $$ \begin{aligned} r &= \dfrac{1}{n-1} \sum_{i=1}^n z_{xi} z_{yi} \ &= \dfrac{ \sum (x_i - \bar x)(y_i - \bar y) }{ \sqrt{\sum (x_i - \bar x)^2 \sum (y_i - \bar y)^2} } \ &= \dfrac{ n \sum(xy) - \sum x \sum y }{ n \sqrt{\sum (x^2) - \big(\sum x \big)^2 } \sqrt{ \sum (y^2) - \big(\sum y \big)^2 } } \end{aligned} $$
Measures whether 2 vars are above/below mean at the same time
Modified Correlation¶
Setting the center as origin \(\implies \bar x=\bar y=0\)
- Contributes +vely if both vars are positive
- Contributes +vely if both vars are negative
- Contributes -vely if both vars are opposing sign
Useful for comparing time-series, returns, etc
Type | Correlation | |
---|---|---|
Strength | Weak | \(\vert r \vert \le 0.5\) |
Moderate | \(0.5 < \vert r \vert < 0.8\) | |
Strong | \(\vert r \vert \ge 0.8\) | |
Direction | Directly | \(r > 0\) |
Inversely | \(r < 0\) |
Coefficient of Determination¶
\(R^2\) value is used for non-linear regression. It shows how well data fits within the regression.
It has a range of \([0, 1]\). Higher the better.
where
Symbol | Meaning |
---|---|
\(\text{SS}_\text{res}\) | Residual sum of squares |
\(\text{SS}_\text{tot}\) | Total sum of squares Proportional to variance of the data |