Causal Model¶
A causal/scientific model \(M\) for a set of RVs \(\{ x_1, \dots, x_n \}\) is a model of the joint distribution \(p(x_1, \dots, x_n)\) as well as the causal structure governing \(\{ x_1, \dots, x_n \}\) which describes the causal relationships among the variables and how the data-generating process occurs $$ M: y \leftarrow x $$
Causal Graph Model¶
Causal diagram/graph is a graph that can be used to represent a causal structure and hence, describe our quantitative knowledge about a causal mechanism
Causal model that uses a causal graph to represent the causal structure is called as Causal Graph Model/Causal Bayesian/Belief network.
- Causal diagrams can make assumptions scientific and "falsifiable"
- Causal diagrams can help identify which variables to randomize, include in model, or ignore
RCM was developed in statistics, causal graphical model is derived from CS/AI.
Parts of Causal Model¶
- Causal Structure \(G\): Direction of causality (What causes what)
- Statistical Model \(H\): Joint distribution function
Example¶
Consider a model \(y = 2x\). This is a statistical model.
If \(y \leftarrow 2x\), then it means that \(x\) causes \(y\). This is a causal model.
Let’s analyze the \(x,y\) pairs for the following sequential changes.
Model | 1. do\((x=2)\) | 2. do\((x=3)\) | 3. do\((y=2)\) |
---|---|---|---|
Statistical | 2, 4 | 3, 6 | 1, 2 |
Causal | 2, 4 | 3, 6 | 3, 2 |
This is because, \(x\) causes \(y\); not the other way around.
do
-calculus¶
Let - \(G\) be a DAG - \(G_{\bar T}\) represent \(G\) post-intervention (ie with all links into treatment \(T\) removed) - \(G_{\underline T}\) represent \(G\) with all links out of treatment \(T\) removed
Rules 1. \(P(y \vert \text{do}(t), z, w) = P(y \vert \text{do}(t), z) \iff Y \perp\!\!\!\perp W \vert (Z, T)\) in \(G_{\bar T}\) 2. \(P(y \vert \text{do}(t), z) = P(y \vert t, z) \iff Y \perp\!\!\!\perp T \vert Z\) in \(G_\underline{T}\) 3. \(P(y \vert \text{do}(t), z) = P(y \vert z) \iff Y \perp\!\!\!\perp T \vert Z\) in \(G_{\bar T}\) and \(Z\) is not a descendent of \(T\)
Causal Diagram/Graph¶
Directed graph that represents the causal structure of a model. It is represented using a bayesian network: DAG (Directed Acyclic Graph) in which each node has associated conditional probability of the node given in its parents.
The joint distribution is obtained by taking the product of the conditional probabilities.
$$ \begin{aligned} p(x_1, \dots, x_n) &= \Pi_{i=1}^n p( x_i \vert \text{pa}(x_i) ) \ x_i & {\tiny \coprod} \text{nd}(x_i) \vert \text{pa}(x_i) \end{aligned} $$ where
- pa = parent
- nd = not descendant
Useful when you are only interested in the causal structure, and not so much in the statistical model
flowchart TD
B([B]) & C([z]) --> A([A])
Why DAG?¶
- Directed, because we want causal effects directions
- Acyclic, because a variable cannot cause itself.
my question is: What about recursive loops, such as our body’s feedback loops
Parts¶
Nodes - vars | - Rounded - Unconditioned - Square - Conditioned |
Directed Edges - Causal directions | 1. The presence of an arrow from \(x_i\) to \(x_j\) indicates either that \(x_i\) has a direct causal effect on \(x_j\) – an effect not mediated through any other vars on the graph, or that we are unwilling to assume such a effect does not exist 2. The absence of an arrow from \(x_i\) to \(x_j\) indicates the absence of a direct effect; the absence of an arrow hence represents a more substantive assumption |
flowchart TB
a([a])
w([w])
x([x])
y([y])
z([z])
x --> y & z
z --> a
w --> y
Term | Condition | Above Example |
---|---|---|
Parent | Node from which arrow(s) originate | \(x\) is parent of \(y\) and \(z\) |
Child | Node to which arrow(s) end | \(y\) and \(z\) are children of \(x\) |
Descendants | ||
Ancestors | ||
Exogenous | vars w/o any parent | \(w, x\) |
Endogenous | vars w/ \(\ge 1\) parent | \(y, z, a\) |
Causal Path | Uni-directional path | \(x z a\) \(z a\) \(x y \(<br />\)w y\) |
Non-Causal Path | Bi-directed path | \(x y w\) |
Collider | Node having multiple parents, where path ‘collides’ | \(y\) in the path \(x y w\) |
Blocked Path | Path with a - conditioned non-collider or - unconditioned collider, w/o conditioned descendants | |
Back-Door Paths | Non-causal paths b/w \(x\) and \(y\), which if left open, induce correlation b/w \(x\) and \(y\) without causation | |
d-separated vars | all paths b/w vars are blocked | |
d-connected vars | \(\exists\) path between the vars which isn’t blocked | |
conditionally-independent vars | If 2 vars are d-separated after conditioning on a set of vars | |
conditionally-dependent/associated vars | If 2 vars are d-connected after conditioning on a set of vars w/o faithfulness condition, this may not be true |
Properties¶
Causal diagrams compatible with a joint distribution \(p(x_1, \dots, x_n)\) must satisfy
Property | Meaning |
---|---|
Causal Markov Condition | Every var is independent of any other vars (except its own effects) conditional on its direct causes |
Completeness | All common causes (even if measured) of any pair of vars on the graph are on the graph This property is as important as the requirement that all relevant factors are accounted for. Our ability to extract causal information from data is predicated on this untestable assumption |
Faithfulness | The joint distribution \(p(x_1, \dots, x_n)\) has all the conditional independence relations implied by the causal diagram, and only those conditional independence relations |
- \(\text{nd}:\) non-descendent
- \(\text{pa}:\) parents
Validating Graphical Model¶
Consider 2 variables \(v_1\) and \(v_2\) - If \(v_1\) is connected to \(v_2\), there should be a correlation - If \(v_1\) is not connected to \(v_2\), there should be no correlation between \(v_1\) and \(v_2\) given \(\text{pa}(v_1)\), ie for each group of \(\text{pa}(v_1)\)
Causal Operations¶
Conditioning: Broad concept of considering or restricting analysis based on certain variables - Adjusting - Selecting
Adjusting | Selecting | |
---|---|---|
Definition | Including variables as covariates in a statistical model | Choosing a subset of data where treatment and control groups are balanced |
Scope | Specific method | Specific method |
Implementation | Typically through regression, propensity score methods, or weighting | Often through matching techniques |
Sample size impact | Generally retains full sample | May reduce sample size |
Primary goal | Control for confounding | Create comparable groups |
Typical use | Accounting for confounders in full dataset | Creating balanced comparison groups |
Flexibility with continuous variables | Generally high | Often requires categorization |
Risk of introducing bias | Possible if adjusting for inappropriate variables | Possible if selection criteria are inappropriate |
Population inference | Often for entire population | May be limited to specific subpopulation |
Causal Relations¶
General Framework¶
flowchart TB
oc([Observed<br/>Confounders])
uc([Unobserved<br/>Confounders])
oce([Observed<br/>Common<br/>Effects])
uce([Unobserved<br/>Common<br/>Effects])
x([x])
y([y])
x --> y
oc ---> x & y
uc -..-> x & y
x & y --> oce
x & y -.-> uce
classDef dotted stroke-dasharray:5
class uc,uce dotted
linkStyle 0 stroke:green
Types¶
Type | \(x, y\) are statistically-independent Correlation \(=\) Causation \(E[y \vert x] = E[y \vert \text{do}(x)]\) | \(E[y \vert \text{do}(x)]\) | Comment | Path ‘__’ by collider \(c\) | Example \(z\) | Example \(x\) | Example \(y\) |
---|---|---|---|---|---|---|---|
Mutual Dependence/ Confounding/ Common Cause | ❌ | Info on \(x\) helps predict \(y\), even if \(x\) has no causal effect on \(y\) | Smoker | Carrying a lighter | Cancer | ||
Conditioned Mutual Dependence | ✅ | \(\sum_i E[y \vert \text{do}(x), c_i] \cdot P(c_i)\) \(=\sum_i E[y \vert x, c_i] \cdot P(c_i)\) | blocked | Smoker=FALSE | Carrying a lighter | Cancer | |
Mutual Causation/ Common Effect | ✅ | blocked | Revenue of company | Size of company | Survival of company | ||
Conditioned Mutual Causation | ❌ | opened | Revenue of company | Size of company | Survival of company=TRUE (we usually only have data for companies that survive) (Survivorship bias) | ||
Mediation | ❌ | ||||||
Conditioned Mediation | ✅ | blocked | 1. Cancer 2. Tax | 1. Tar deposits in lung 2. Economic consequences | 1. Smoking 2. Economic conditions |
flowchart TB
subgraph Mediation
subgraph ucm[Unconditioned]
direction TB
x1([x]) --> c1([z]) --> y1([y])
%% x1 --> y1
end
subgraph cm[Conditioned]
direction TB
x2([x]) --> c2[z] --- y2([y])
%% x2 --> y2
end
end
subgraph mc2[Mutual Causation 2]
subgraph ucmc2[Unconditioned]
direction TB
x8([x]) & y8([y]) --> w8([w]) --> c8([z])
%% x8 --> y8
end
subgraph cmc2[Conditioned]
direction TB
x7([x]) & y7([y]) --> w7([w]) --> c7[z]
%% x7 --> y7
end
end
subgraph mc[Mutual Causation 1]
subgraph ucmc[Unconditioned]
direction TB
x4([x]) & y4([y]) --> c4([z])
%% x4 --> y4
end
subgraph cmc[Conditioned]
direction TB
x3([x]) --> c3
y3([y]) --> c3[z]
%% x3 --> y3
end
end
subgraph md[Mutual Dependence]
subgraph ucmd[Unconditioned]
direction TB
c6([z]) --> x6([x]) & y6([y])
%% x6 --> y6
end
subgraph cmd[Conditioned]
direction TB
c5[z] --- x5([x]) & y5([y])
%% x5 --> y5
end
end
%%linkStyle 4,24,23 stroke:none;
Variables¶
Use dagitty for help - https://www.dagitty.net/ - https://ahmedthahir.github.io/dagitty/
Type | Should adjust? | Adjustment Stage | Adjustment Reason | Selection allowed? (Selection gives same result for population?) | Selection Reason |
---|---|---|---|---|---|
Independent | ❌ | Not related with \(y\) | ❌ | ||
Treatment | ✅ | 2 | Obviously | ✅ | |
Non-treatment causes of \(y\) | ✅ | 2 | Avoid omitted variable bias | ❌ | |
Cause/assignment of \(T\) | ✅ | 1 | Correct for non-random assignment | ✅ | |
Cause/assignment of non-treatment causes | ✅ | 1 | Correct for non-random assignment | ✅ | |
Consequence of \(T\) | ❌ | Not related with \(y\) | ❌ | ||
Confounder | ✅ | 2 | \(z\) affects \(y\) Close backdoor | ❌ | |
Cause/assignment of confounder | ✅ | 1 | Correct for non-random assigment | ||
Effect modifier | ✅ | 2 | Effect of \(x\) on \(y\) changes with \(z\) | ||
Instrumental vars | ⚠️ | 1 | Only if assignment to treatment not available | ||
Mediator | ❌ | Conditioning causes over-controlling: Will take away some effect of \(x\) on \(y\) Any result is only valid for the conditioned value of mediator | ✅ | ||
Collider | ❌ | Conditioning causes endogeneity/selection effect - Can create fake causal effects - Can hide real causal effects Any result is only valid for the conditioned value of collider | ❌ | ||
Colliding mediator | ❌ | Opens backdoor | ❌ | ||
Consequence of \(y\) | ❌ | Logically-flawed; only predictive power, no causal power | ❌ |
Confounding¶
Self Selection Bias¶
Special case of confounding, when \(z\) affects the selection of \(x\) and also has a causal effect on \(y\), then \(z\) is confounder.
flowchart TB
c([Ability/Talent])
a([Education])
b([Earnings])
c -->|Affects<br/>selection| a
c -->|Directly<br/>Affects| b
a -->|Associated but<br/>not necessarily causes| b
Here, education may not necessarily causally affect earnings.
Unmeasured Confounding¶
We need to find new ways to identify causal effect
\(z\) fully observed | \(\not \exists\) unmeasured confounding | Selection on |
---|---|---|
✅ | ✅ | observables |
❌ | ❌ | unobservables |
However, in some cases, there can exist a set of observed vars conditioned such that it satisfies the back-door criterion, by blocking all non-causal paths b/w \(x\) and \(y\)
flowchart TB
subgraph Conditioned
direction TB
wc([w]) -.- zc & yc
zc[z] --- xc([x]) --> yc([y])
end
subgraph Unconditioned
direction TB
wu([w]) -.-> zu & yu
zu([z]) --> xu([x]) --> yu([y])
end
linkStyle 3,7 stroke:green
linkStyle 0,1,2 stroke:none
classDef dotted stroke-dasharray:5
class wu,wc dotted
\(w\) is a confounder to \(x\) and \(y\), but we do not need to observe it, as causal effect of \(x\) on \(y\) is identifiable by conditioning on \(z\)
Mediator¶
Effect Modifiers¶
Confounders \(s\) that change the causal effect of a treatment \(x\), since their causal effect on the outcome \(y\) interacts with treatment’s causal effect on \(y\)
Controlling them help control conditionally randomized experiments.
Consider the following causal diagram.
flowchart TD
subgraph After Simplification
others([s<sup>y</sup>]) & x2([X]) --> y2([Y])
end
subgraph Before Simplification
a([A]) & b([B]) & c([C]) & w([W]) & x1([X]) --> y([Y])
end
where \(s^y\) is anything other than \(x\) that can causally affect \(y\); ie, the set of potential effect modifiers
An example could be gender, temperature, etc.
Mathematically, \(s\) is an effect modifier if $$ \begin{aligned} P(y) &\ne P(y \vert s) \ E[y \vert \text{do}(x)] &\ne E(y \vert \text{do}(x), s) \end{aligned} $$
Instrumental Variable¶
Independent var that affects \(y\) indirectly via \(T\) only
flowchart TB
subgraph Scenario B
direction TB
T2 --> y2
v((v)) --> I2((I)) & T2
u2((u)) --> T2((T)) & y2((y))
end
subgraph Scenario A
direction TB
I((I)) --> T --> y
u((u)) --> T((T)) & y((y))
end
classDef green fill:green
class I2,I green
A variable \(I\) can serve as an instrumental variable for identifying the causal effect of \(T\) on \(y\) \(\iff\) 1. Relevance: \(I\) is associated with treatment \(T\) 1. \(r(I, T) \ne 0\) 2. Testable with stats 2. Exclusion: \(I\) indirectly affects \(y\) only through \(x\) 1. \(I\) affects \(y\): \(r(y, I) \ne 0\) 2. \(I\) does not affect \(y\) directly: \(r(y, I \vert T) = 0\) 1. Every open path connecting \(I\) and \(y\) has an arrow pointing into \(x\) 2. \(I\) is d-separated and independent from any common causes of \(x\) and \(y\), since \(x\) is a collider on their paths 3. Testable with stats and theory 4. Hardest condition to satisfy and prove 3. Exogeneity: \(I\) is not associated with omitted variables \(u\) 1. \(r(I, u) = 0\) 2. Testable with theory
Rule of thumb: more surprising an IV is, the more likely it is a good one
Education¶
Parent’s education is IV for child wage through child education
flowchart TB
pe(["Parents<br/>Education<br/>(IV)"]) --> ce(["Child<br/>Education<br/>(x)"]) --> ci(["Child<br/>Income<br/>(y)"])
pe -.- ag(["Child<br/>Age, Gender<br/>(s)"]) --> ce & ci
city(["City<br/>(FE)"]) --> pe & ce & a & ci
a(["Ability<br/>(s)"]) -..-> ce & ci
classDef dotted stroke-dasharray:5
class a dotted
classDef highlight fill:darkred,color:white
class pe highlight
linkStyle 1 stroke:green
College-educated parents have +ve impact on children’s college attainment, either through better home education or because they are more capable of affording college education.
If we assume that more educated parents
- don’t produce children with higher unobserved abilities/preferences that affect education and wages
- don’t directly help children obtain higher wage jobs
The statement holds true as the only way parent’s education affects individual’s earnings is through effect on child’s education.
Demand Elasticity¶
flowchart LR
op[Oil Price]
cs[Climate @ Production]
cd[Climate @ Consumption]
subgraph Supply
direction LR
ps[Supply Price]
qs[Supply Quantity]
end
subgraph Demand
direction LR
pd[Demand Price]
qd[Demand Quantity]
end
pset["Selling Price (x)"]
qeq["Sales Quantity<br/>ie, Equilibrium Quantity<br/>(y)"]
op & cs --> Supply --> pset --> qeq
cd --> Demand --> qeq
- IV through supply: Oil Price and Climate at production site - IV through demand: Climate at consumption site Promotion¶
Randomized promotion/encourage of usage of a policy is an instrument, as it matches all 3 criteria
Will only give the CACE
Fixed Effects¶
Variable that captures effect of unobserved confounders
City is a Fixed Effect¶
City captures ability, school quality, productivity and other unobserved factors
IDK¶
Intervention¶
\(\text{do}(x_i = a)\) implies removing all arrows entering \(x_i\) while setting \(x_i = a\), hence removing the effect of all parents of \(x_i\)
Hence \(x_i\) becomes endogenous after intervention
flowchart TB
subgraph Post-Intervention
direction TB
a2((a)) & b2((b)) --- c2((c)) --> d2((d))
end
subgraph Pre-Intervention
direction TB
a1((a)) & b1((b)) --> c1((c)) --> d1((d))
end
linkStyle 0,1 stroke:none
Pre-Intervention | \(p(A, B, C, D) = p(D \vert C) \cdot p(C \vert A, B) \cdot P(A) \cdot p(B)\) |
Post-Intervention | \(p( \ A, B, C, D \vert \text{do}(C=c) \ ) = p(D \vert C=c) \cdot P(A) \cdot p(B)\) |
Adjusting/Conditioning¶
To ensure that \(E(y|x=1) = E(y^1)\), we need to ensure that the population and treated-sample are the similar in their features.
2 ways of eliminating the unwanted effect of confounders/effect modifier \(z\) - Adjusting - Remove the effect of \(z\) on other variables - Conditioning - Hold variable \(z\) constant
Adjusting¶
- Eliminates self-selection effect from confounders
- Way to adjust for backdoors
- Reduces model dependence
This allows for removal of unnecessary data
Techniques¶
Technique | Steps |
---|---|
KNN | Find untreated observations that are very similar to treated observations based on confounders |
PSM Propensity Score Matching | Predict treatment probability as function of confounders Not great for matching |
IPW Inverse probability weighting | Observations with high propensity for treatment but don't get it (and vice-versa) have higher weights \(\dfrac{\text{Treatment}}{\text{Propensity}} + \dfrac{1-\text{Treatment}}{1-\text{Propensity}}\) |
Conditioning¶
Rule of thumb - Condition common causes - Do not condition common effects
If we condition on a set of vars \(z\) that block all open non-causal paths between treatment \(x\) and outcome \(y\), then
- the causal effect of \(x\) on \(y\) is identified (estimated from observed data)
- \(z\) is said to satisfy the ‘back-door criterion’
- Conditioning on z makes \(x\) exogenous to \(y\)
Covariate Balance¶
In a sample where is \(z\) is a conditioned confounder $$ P(z \vert x = x_i) = P(z \vert x = x_j) \quad \forall i, j $$
If \(x\) is binary $$ \begin{aligned} & x \in { 0, 1 } \ \implies & P(z \vert x=1) = P(z \vert x=0) \end{aligned} $$
Matching¶
Converting non-RCT observed sample into a sample that satisfies Covariate Balance, ie behaves similar to a RCT sample through control of observed confounding.
Limitations of Causal Graph¶
Effect Modifiers¶
There is no way for causal graph to represent the causal effect of input \(x\) on output \(y\) due to some effect modifier. For example
flowchart TD
f(("Fertilizer<br/> (x)")) & t(("Temperature<br/> (s<sup>y</sup>)")) --> y(("Yield<br/> (y)"))
Temperature does not affect how much usage of fertilizer, but we know that the temperature affects the effectiveness of the fertilizer, but we have no way of representing that here. However, the Causal Effect Formula will take care of that.
Cycles¶
Does not support cycles
Workaround
flowchart TB
subgraph After
direction TB
1[a_t-1] --> 2[b_t] --> 3[a_t]
end
subgraph Before
direction TB
a --> b --> a
end