Bayesian Inference
Table of Contents
Finding the posterior distribution of parameters \(z\) (aka latent variables) given the observed data \(x\) is called inference.
\[ P(z | x) = \frac {P(x, z)} {P(x)} \]
But computing \(P(x) = \int P(x,z) dz\) is usually intractable.
See also:
- Lecture 10: An Introduction To Bayesian Inference (II): Inference Of Parameters And Models by David MacKay (https://www.youtube.com/watch?v=mDVE0M-xQlc&t=2606s)
- Talk by Luke Hewitt, MIT: Bayesian Inference in Generative Models
1. Variational Inference
Since the exact Bayesian Inference is intractable, we need approximate technique. Variational Inference is one such technique.
The idea is to choose another tractable family of distributions \(q(z; \lambda)\) (variational family) and then find \(q\) in the family that is closest to \(P(z|x)\) by minimizing KL divergence. Finally, that \(q\) is used inplace of the true posterior \(P(z|x)\).
The KL divergence objective \(\mathbf{KL}(q(z; \lambda) || p(z | x))\) is equivalent to minimizing ELBO (Evidence Lower Bound):
\[ \log P(x) \geq \mathbb{E}_q [\log P(x,z) - \log q(z; \lambda)] \]
This ELBO objective is minimized using algorithm like Gradient descend to find the optimal parameters \(\lambda^*\).
2. Markov Chain Monte Carlo (MCMC)
This is another way to do Bayesian Inference. It lets us draw samples from \(P(z|x)\) without computing \(P(x)\) directly. And once we have the samples, we can approximate the expectations:
\[ \mathbb{E}_p[f(z)] \approx \frac 1 N \sum_{i=1}^N f(z^i) \]
The key idea is to create a markov chain whose stationary distribution is the target distribution \(P(z | x)\). Then run the chain long enought that it is well mixed, and then use the generated samples to compute quantities of interest.
Algorithms:
- Metropolis–Hastings (MH): Propose a new sample, accept/reject based on acceptance ratio to ensure detailed balance.
- Gibbs Sampling : Special case of MH where we sample each variable from its conditional distribution given others.
- Hamiltonian Monte Carlo (HMC) : Uses gradient information and simulated physics to explore the space more efficiently.
- No-U-Turn Sampler (NUTS) : Adaptive version of HMC that avoids manual tuning of path length.