03 Basic Regression Analysis
Regression¶
Examine relationship between different variables
Dependence of one variable on another variable
Identify PRF, using SRF
Assumptions¶
- Dependent var \(\to\) Random Variable whose distribution changes for different variables
- Independent \(\to\) Non-Random
Purpose¶
Derive a function that traces through the conditional means of \(y\) corresponding to different values of \(x\)
Expected value = mean = average Same meaning
Population¶
not necessarily humans
It refers to any set of data (universal); different from sample (will be covered later).
PRF¶
Population Regression Function
Also called as Conditional Expectation Function(CEF)
It is theoretical; we almost never have access to this
It is always linear wrt hyper-parameters, but may/may not be linear wrt variables
Linearity¶
Linear wrt variables | Non-Linear wrt variables | |
---|---|---|
Linear wrt parameters | \(\beta_0 + \beta_1 x_1\) | \(\beta_0 + \beta_2 {x_i}^2\) |
Non-Linear wrt parameters | \(\beta_0 + {\beta_1}^2 x_1\) | \(\beta_0 + {\beta_1}^2 {x_1}^2\) |
Transformation¶
One more thing in slide
Some models cannot be changed; they are intrinsically non-linear
Stochastic Specification of PRF¶
Components¶
- Systematic/Deterministic/Common/Explained component
- Non-Systematic/Random/Disturbance/Idiosyncratic component
- effect of all omitted variables
- random effects
- effect of measurement error
Equivalency with PRF¶
Stochastic Specification is equal to PRF, as long as \(E(u_i|x_i) = 0\); this does not mean that \(u_i = 0 , \forall i\)
Why? This is because only if it is so, the line passes through the expectations of \(y\) for different values of \(x\). It is mathematically possible only if so. (Draw graph and see)
Why do we need Stochastic Specification?¶
- Vagueness of theory
- Social Sciences has no definite theory for any event
- Randomness in human behavior
- Incorporates effect of missing data
- Wealth data is not as easy to get as income data
- More appropriate for inexact relatioships
- Captures effect of omitted variables
- Some variables are not as important
- Captures effect of poor proxy variables
- Principle of Parsimony We usually try to limit to simple models
- Incorrect functional form
- Unknown theory
- Linear/Non-Linear function
- Incorporates measurement errors
Proxy Variable¶
A variable that is closely-associated with the variable we want to use.
We use proxy variables, when the main variable is not available
eg:
- Age and Experience
- CPI and Inflation
Types of Relationships¶
Statistical/Schochastic | Deterministic | |
---|---|---|
Independent var | Non-Random | Non-Random |
Dependent var | Random | Non-Random |
eg | Predicting Crop Yield | Ohm’s Law |
Terms¶
\(y\) | \(x\) |
---|---|
Dependent Explained Predictand Regressand Response Exogeneous | Independent Explanatory Predictor Regressor Stimulus Endogeneous |
Capital Flight¶
Capital moves from one country to another
Regression \(\ne\) Causation¶
Does not help understand the direction of causality
We need to use domain knowledge, and impose restriction that \(x\) causes \(y\)
Regression vs Correlation¶
Regression | Correlation | |
---|---|---|
Understand | Exact relationship | Degree of linear association between 2 variables |
Assumption | One Dependent variable One/more independent variable | Both \(x\) and \(y\) are random |
Exogeneous vs Endogeneous¶
Exogeneous vs endogeneous depends on what you assume to be the system
Exogeneous | Endogeneous | |
---|---|---|
In our control? | ❌ | ✅ |
- Exo = out
- Endo = in
Basic Concepts of Regression¶
- Derive Conditional values of \(y\) wrt \(x\)
- Calculate Conditional probabilities of \(y\) for different values of \(x\)
- Calculate conditional mean
- Calculate the weighted average using probality of occurance
- This is different mean from arithmetic mean(simple average)
The expected value of unconditional random variable \(y\) is ???
Variability of \(y\)¶
The variation of \(y\) for different values of \(x\)
Higher variability is preferred