Bayesian Network¶
Technique for describing complex joint distributions using simple, local distributions (conditional probabilities)
Also known as probabilistic graph model
Bayes net = Graph + Local conditional probabilities
Local interactions chain together to give global, indirect interactions
Note: Kindly go through Bayes' Theorem in Probability and Statistics
Semantics¶
DAG that represents the dependence b/w vars
- Nodes - represents variables
- Links - X points to Y, implies X has direct influence over Y or X is a parent of Y
- Conditional Probability Table - each node has a conditional probability distribution which determines the effect of the parent on that node
Why needed?¶
Inefficient to use full joint distribution table as probabilistic model
- Joint is way too big to represent explicitly
- Hard to learn/estimate anything empirically about more than few variables at a time
Aspects¶
Definition | \(P(X=x)\) |
Representation | Given a BN graph, what kinds of distributions can it encode? |
Modelling | What BN is most appropriate for a given domain |
Inference | Given a fixed BN, what is \(P(X \vert e)\) |
Learning | Given data, what is best BN encoding |
Diagnosis | Infer P(problem type | symptoms) |
Prediction | Infer prob dist for values that are expensive/impossible to measure |
Anomaly detection | Detect unlikely obs |
Active learning | Choose most informative diagnostic test to perform given obs |
Joint Probability Distribution¶
Independence in BN¶
\(x {\perp \!\!\! \perp} y | z\) is read as \(X\) is conditionally independent of \(Y\) given \(Z\)
$$ X {\perp !!! \perp} Y | Z \iff P(x, y \vert z) = P(x \vert z) \cdot P(y \vert z) $$ Are 2 nodes independent given evidence?
- If yes: prove using algebra (tedious)
- If no: prove with counter example
D-Separation¶
Condition that explains the independence of subgraphs
Query: $$ x_i {\perp !!! \perp} x_j \vert { x_{k_1}, \dots, x_{k_n} } $$ Check all (undirected) paths between \(x_i\) and \(x_j\)
Independence guaranteed | |
---|---|
If one/more paths active | ❌ |
All paths inactive | âś… |
Casual chains¶
graph LR
x((x))-->y((y))
y((y))-->z((z))
\(x {\perp \!\!\! \perp} y | z\) guaranteed
Common Cause¶
graph TD;
y((y))-->x((x));
y((y))-->z((z));
\(x {\perp \!\!\! \perp} y | z\) guaranteed
Common effect¶
graph TD;
x((x))-->z((z));
y((y))-->z((z));
- \(x {\perp \!\!\! \perp} y\) guaranteed
- \(x \centernot{\perp \!\!\! \perp} y | z\) guaranteed
Active¶
A path is active if every triple in path is active
NOTE : All possible configurations for active and inactive triples are below
graph TB
subgraph Inactive
direction LR
subgraph i1
direction LR;
Ai1(("x")) --> Bi1["z"] -->Ci1(("y"));
style Bi1 fill : #808080
end
subgraph i2
direction TB
Bi2["z"]-->Ai2(("x")) & Ci2(("y"));
style Bi2 fill : #808080
end
subgraph i3
direction TB
Ai3(("x")) & Bi3(("y")) --> Ci3(("z"))
end
subgraph i4
direction TB
Ai4(("x")) & Bi4(("y")) --> Ci4(("z")) --> Di4((" "))
end
end
subgraph Active
direction LR
subgraph 4
direction TB
x4(("x"))-->C4(("z"));
y4(("y"))-->C4(("z"));
C4(("z"))-.->D4[" "]
style D4 fill : #808080
end
subgraph 3
direction TB
A3(("x"))-->C3["z"];
B3(("y"))-->C3["z"];
style C3 fill : #808080
end
subgraph 2
direction TB
B2(("z"))-->A2(("x")) & C2(("y"));
end
subgraph 1
direction LR
A1(("x"))-->B1(("z"));
B1(("z"))-->C1(("y"));
end
end
Topology Limits Distributions¶
Given some graph topology \(G\), only certain joint distributions can be encoded
The graph structure guarantees certain (conditional) independencies
Bayes net’s joint distribution may have further (conditional) independencies that is not detectable until inspection of specific distribution
Adding arcs increases set of distributions but has several costs
Full conditioning can encode any distribution
Building a Bayes Net¶
- Choose set of relevant vars
- Represent each var by a node
- Choose ordering for vars \(x_1, \dots, x_m\) such that if \(x_i\) influences \(x_j\), then \(i < j\)
- Add links: Link structure must be acyclic
- Add conditional probability table for each node
Interpreting¶
- Given parents, each node is conditionally independent of all non-descendents in the tree
- 2 unconnected vars may still be correlated
- Whether 2 vars are conditionally-independent can be deduced using “d-separation”
Inference¶
- Doing exact inference is computationally hard
- Tractable in some cases: trees
- We can instead perform approximate inference
-
Likelihood-weighted sampling
-
Marginal probability
- Rewrite in terms of joint distribution
- Fix query vars
- Sum over unknown vars
- Conditional probability
- Fix query vars
- Fix evidence vars
- Sum over unknown vars
- Add normalization constant \(\alpha\) such that
- Rewrite joint probability using Bayes Net factors
- Choose variable order; take summations inside
Learning¶
- Structure learning
- Parameter learning
Variable Elimination¶
Eliminating all vars in turn until there is a factor (function from set of vars) with only query var
To eliminate a var
- Join all factors containing that var
- Sum out influence of var on a new factor
- Exploits product form of joint distribution