Gaussian EM K-Means Clustering¶

Probabilistic clustering which requires K means as well; the output of k means is fed into this model

EM for Gaussian Mixture¶

Alternates between 2 steps

E-Step: Given an estimate \(\theta_e\) of the weights, compute \(P_\theta(y \vert x_i)\) and use it to ‘hallucinate’ expected cluster assignments \(z_i\)
M-Step: Find a new \(\theta_{e+1}\) that maximizes the marginal log-likelihood by optimizing \(P_\theta(x_i, z_i)\) given the \(z_i\) from step 1

\[ \begin{aligned} \theta_{e+1} &= \arg \max_\theta \sum_{i=1}^n \mathbb E_{z_i \sim P_{\theta_e} (y \vert x_i)} \log P_\theta(x_i, z_i) \\ &= \arg \max_\theta \sum_{i=1}^n \sum_{c=1}^C P_{\theta_e}(y=c \vert x_i) \log P_\theta(x_i, y=k) \end{aligned} \]

\[ \begin{aligned} P_\theta(y=c \vert x) &= \dfrac{P_\theta (y=c, x)}{P_\theta(x)} \\ &= \dfrac{P_\theta (y=c) P_\theta (x \vert y=c)}{ \sum_{c=1}^C P_\theta (y=c) P_\theta (x \vert y=k) } \end{aligned} \]

Go to step 1

This process increases the marginal likelihood at each step and eventually converges

Advantages¶

Easy to implement
Guaranteed to converge?
Works for many ML models

Limitations¶

Can get stuck in local optima
May not be able to compute \(P_{\theta_e} (y \vert x_i)\) in each model

Steps¶

Expectation Maximization

Initialize gaussian parameters \(\mu_k, \Sigma_k, \pi_k\)
\(\mu_k \leftarrow \mu_k\)
\(\Sigma_k \leftarrow \text{cov \(\Big(\) cluster(\)k\() \(\Big)\)}\)
\(\pi_k = \frac{\text{No of points in } k}{\text{Total no of points}}\)
E Step: Assign each point \(X_n\) an assignment score \(\gamma(z_{nk})\) for each cluster \(k\)

\[ \gamma(z_{nk}) = \frac{ \pi_k N(x_n|\mu_k, \Sigma_k) }{ \sum_{i=1}^K \pi_i N(x_n|\mu_i, \Sigma_i) } \]

M Step: Given scores, adjust \(\mu_k, \pi_k, \Sigma_k\) for each cluster \(k\)

\[ \begin{aligned} \text{Let } N_k &= \sum_{n=1}^N \gamma(z_{nk}) \\ N &= \text{Sample Size} \\ \mu_k^\text{new} &= \frac{1}{N_k} \sum_{n=1}^N \gamma(z_{nk}) x_n \\ \Sigma_k^\text{new} &= \frac{1}{N_k} \sum_{n=1}^N \gamma(z_{nk}) (x_n - \mu_k^\text{new}) (x_n - \mu_k^\text{new})^T \\ \pi_k^\text{new} &= \frac{N_k}{N} \end{aligned} \]

Evaluate log likelihood

\[ \ln p(X| \mu, \Sigma, \pi) = \sum_{n=1}^N \ln \left| \sum_{k=1}^K \pi_k N(x_n | \mu_k, \Sigma_k) \right| \]

Stop if likelihood/parameters converge

K Means vs Gaussian Mixture¶

	K means	Gaussian Mixture
	Classifier model	Probability model
Probabilistic?	❌	✅
Classifier Type	Hard	Soft
Pamater to fit to data	\(\mu_k\)	\(\mu_k, \Sigma_k, \pi_k\)
❌ Disadvantage	If a class may belong to multiple clusters, we have to assign the first one If sample size is too small, initial grouping determines clusters significantly	Complex
✅ Advantage	Simple Fast and efficient, with \(O(tkn),\) where - \(t =\) no of iterations - \(k =\) no of clusters - \(n =\) no of sample points	If a class may belong to multiple clusters, we have a probability to back it up

Last Updated: 2024-05-14 ; Contributors: AhmedThahir