Expectation Maximization

The EM algorithm is an iterative method to find maximum likelihood estimates of parameters in statistical models when the model depends on unobserved latent variables. E.g. Finding parameters of a Gaussian Mixture Model can be done by taking latent variables as the mixture which each data point falls under.

EM algorithm consists of two steps:

Expectation computation: Compute the expected likelihood of current parameters by sampling latent variables using the same parameters.
Maximization: Optimize the parameters to increase the likelihood, while keeping the latent variables fixed. This optimization may involve solving linear system of equation

\begin{align*} \theta^{(t+1)} = \arg \max_{\theta} E_{Z \sim p(\cdot |X, \theta^{(t)})} [\log p(X, Z|\theta)] \end{align*}

Here,

\(Z\) are latent variables (discrete)
\(X\) are data (discrete or continuous)
\(\theta^{(t)}\) are model parameters (continuous) at iteration \(t\)

Uses:

EM algorithm is used when we can't solve the maximum likelihood equation directly,
This can happen when the latent variables are involved,
EM is also useful in place of algorithms that require derivatives of the likelihood function

Demerits:

Traditional methods such as EM or other local search heuristics can get stuck in local optima or take a long time to converge. [Spectral Algorithms.pdf: Page 29]