Kullback-Leibler Divergence
\begin{align*} D_{KL}(p || q) = H(p,q) - H(p) \\ \end{align*} \begin{align*} D_{KL}(p || q) &= \sum p \log \frac p q \\ &= \mathbb E_p \left[ \log \frac p q \right ] \end{align*}- Measures the information loss when approximating true distribution \(p\) using model distribution \(q\)
- Not symmetric and thus not a metric
There exists a symmetric version of KL divergence called the Jensen–Shannon divergence
\(JSD(p || q) = 1/2 D(p || m) + 1/2 D(q || m)\) where \(m = 1/2 (p + q)\) is the mixture distribution
The square root of JSD is a metric, Jensen-Shannon Distance.
- See Cross Entropy