Causal Learning¶
flowchart LR
t[Theory] --> Model --> Evidence --> t
Types¶
Causal Effect Learning | Causal Mechanism Learning | Causal Inference Learning |
---|---|---|
Does \(x\) have a causal effect on y? If yes, how large is the effect | If causal effect exists, what is the mechanism behind it? | Understand rational decisions that can be taken, built on causal mechanism learning and prior causal inference |
- What - How much | - Why - How | - What can we do? |
- discovering patterns - making predictions | - understanding | - decison-making |
âEffects of causesâ | âCauses of effectsâ |
Manipulation of \(x\)¶
Being able to manipulate \(x\) to see its effect on \(y\) is essential to understanding causality. If there is no way to manipulate \(x\), then it is difficult to understand causality.
Morever, according to the instructor, it is pointless to causal inference as if we cannot change it (even theoretically), then we canât really make better decisions, you know? So many questions we analyze when doing research is basically useless.
For eg, analyzing âwhat is the causal effect of height on your incomeâ. This is kinda pointless, because itâs not like we can change our height. Atleast âwhat is the causal effect of democracy on economic growthâ is an acceptable analysic, because theoretically we can change the democracy level.
I have an example. Analysing the âcausal effect of unemployment on economic growthâ is not very useful, because even though we can hypothetically manipulate unemployment indirectly, we canât exactly control it directly.
Type of Manipulation¶
The mechanism with which you âdoâ \(x\) will have different results. Hence, it is important to have a clear mechanism for âdoâ-ing \(x\) before starting your analysis.
For eg, for the theoretical democracy example, are you going to forcefully implement a democracy? or will the citizens peacefully request?
Experimentational Causal Analysis¶
once the experiment is over, the correlation is mathematically equal to the causation
Steps¶
- manually set \(x=1\)
- observe the value of \(y\)
- repeat
- take average value of y
Disadvantages¶
- not always feasible (especially in economics), and it is not possible to perform the experiment
- everyone is different, the experiment might not give an accurate inference
Example¶
RCT (Randomized Control Testing)
- test group is do(x=1) - taking drug
- control group is do(x=0) - not taking drug
Causal Inference in AI¶
- how should a robot acquire causual information through interaction with its environment
- how should a robot receive causal information from humans
According to the lecturer, a lot of modern-day AI is not âintelligenceâ. Just because the algorithm can recognize images by trained data is not exactly âintelligenceâ.
True hallmark of intelligence is the ability to make causal inference, from looking at statistical patterns.
Causal Inference Models¶
There are 2 types of models
- Rubin Model
- Judea Pearl Model The instructor says that this is better, in his opinion
Identifiability¶
\(\theta(M)\) is if it can be uniquely determined based on observations of \(v\).
I didnât really understand this.
IDK¶
Requires prior knowledge regarding the data-generating causal mechanism.
Such knowledge can only exist as a result of previously-observed information and conducted studies.
Hence, causal inference builds on past causal inference
Source of Associations¶
Reasons why \(x\) and \(y\) can be associated
- \(x\) causes \(y\) directly
- \(x\) causes \(y\) indirectly
- \(x\) and \(y\) have common cause(s)
- Analysis is conditioned on their common descendant(s)
Importance of Causal Learning¶
Russelâs Chicken¶
This short story shows how pure reliance on past data is bad.
The chicken assumes that whenever the farmer comes, it is to feed it. However, there will one day, the farmer comes to kill it.
Hence, the lack of understanding why something happens might be very dangerous.
2008 US Financial Crisis¶
Default prediction was based on the historical data, in which housing prices were always rising
However, this time, the house pricing were going down
Simpsonâs Paradox¶
This paradox looks at the effectiveness of a drug.
For example, in this study, the composition makes a difference, ie
- in the âdrugâ group, there are more women than men
- in the âno drugâ group, there are more men than women
This disparity will give an incorrect understanding
Moreover, for this particular disease, women have a lower recovery rate than men. That should be taken into account as well.
Letâs take another example. Consider a simple example with 5 cats and 5 humans. Let 1 cat and 4 humans be given the drug. Now, the values in the table show the recovery rate.
Drug | No Drug | |
---|---|---|
Cat | \(1/1 = 100\%\) | \(3/4 = 75\%\) |
Human | \(1/4 = 25\%\) | \(0/1 = 0\%\) |
Overall | \(2/5 = 40\%\) | \(3/5 = 60\%\) |
If we look at individual groups, cats are better off with drugs, and so are the humans.
However, when we look at overall we can see that the population as a whole is better without the drugs.
US Political Support¶
Similar to Simpsonâs Paradox
Level | Richer you are, more likely to be a __ | Reason |
---|---|---|
Individual | Republican | Republican individuals are richer and want lower taxes |
State | Democrat | Richer societies are usually morally âmodernâ; Poorer one are usually conservative and religious Democrats have more âmodernâ policies |
Sampling Bias¶
Sample-selection bias¶
Type of sampling bias that arises when we make inference about a larger population from a sample that is drawn from a distinct subpopulation
Sample-selection bias can be thought of as a missing data problem, where data are NMAR (not missing at random)
Survivorship/Survival Bias¶
Special type of sample-selection bias
Mutual Fund Performance¶
Suppose we are interested in how the size of assets under management affects a fundâs performance. If we simply look at the relationship between fund size and returns among existing funds, however, there will be what is referred to as a survival bias: we do not observe funds that have closed due to bad performance.
So if fund size negatively affects performance, we may end up under-estimating the magnitude of the effect.
Planes in war¶
The planes that returned from war had lot of spots with bullet shots.
Some person suggested strengthening only those spots. Initially, that makes sense - these are the areas that got shot so we need to strengthen. But, that is wrong.
Another person said that these are the planes that returned despite getting shot at these spots. That means that we have to focus on other places, because the planes that got shot there never returned.
Clearly data can be misleading, without understanding the underlying cause
Wages¶
Credit card default¶
We cannot use the relationship between Income, balance, and default status for credit card holders to predict default rate for a random credit card applicant, since these people part of the available data have been filtered already as potentially good credit card users
Hence, we can only use it predict default for a random person already having a credit card
This is a case of censoring
Success Stories¶
Advice by someone successful
¶
Aggregate Reversal¶
Any statistical relationship between two variables may be reversed by including additional factors in the analysis
If you just look at statistical data, it might be misleading.
Once we devide the population into sub-population based on categories such as sex, then it becomes clearer. This is because why try understanding the underlying mechanism. This phenomenon is called as aggregate reversal.