Bayesian Inference
Table of Contents
Finding the posterior distribution of parameters \(z\) (aka latent variables) given the observed data \(x\) is called inference.
\[ P(z | x) = \frac {P(x, z)} {P(x)} \]
But computing \(P(x) = \int P(x,z) dz\) is usually intractable.
See also:
- Lecture 10: An Introduction To Bayesian Inference (II): Inference Of Parameters And Models by David MacKay (https://www.youtube.com/watch?v=mDVE0M-xQlc&t=2606s)
1. Variational Inference
Since the exact Bayesian Inference is intractable, we need approximate technique. Variational Inference is one such technique.
The idea is to choose another tractable family of distributions \(q(z; \lambda)\) (variational family) and then find \(q\) in the family that is closest to \(P(z|x)\) by minimizing KL divergence. Finally, that \(q\) is used inplace of the true posterior \(P(z|x)\).
The KL divergence objective \(\mathbf{KL}(q(z; \lambda) || p(z | x)\) is equivalent to minimizing ELBO (Evidence Lower Bound):
\[ \log P(x) \geq \mathbb{E}_q [\log P(x,z) - \log q(z; \lambda)] \]
This ELBO objective is minimized using algorithm like Gradient descend to find the optimal parameters \(\lambda^*\).
2. Markov Chain Monte Carlo (MCMC)
This is another way to do Bayesian Inference. It lets us draw samples from \(P(z|x)\) without computing \(P(x)\) directly. And once we have the samples, we can approximate the expectations:
\[ \mathbb{E}_p[f(z)] \approx \frac 1 N \sum_{i=1}^N f(z^i) \]
The key idea is to create a markov chain whose stationary distribution is the target distribution \(P(z | x)\). Then run the chain long enought that it is well mixed, and then use the generated samples to compute quantities of interest.
Algorithms:
- Metropolis–Hastings (MH): Propose a new sample, accept/reject based on acceptance ratio to ensure detailed balance.
- Gibbs Sampling : Special case of MH where we sample each variable from its conditional distribution given others.
- Hamiltonian Monte Carlo (HMC) : Uses gradient information and simulated physics to explore the space more efficiently.
- No-U-Turn Sampler (NUTS) : Adaptive version of HMC that avoids manual tuning of path length.