Kullback-Leibler Divergence
\(D_{KL}(p || q) = H(p,q) - H(p)\)
- Measures the information loss when approximating true distribution \(p\) using model distribution \(q\)
- Not symmetric and thus not a metric
There exists a symmetric version of KL divergence called the Jensen–Shannon divergence
\(JSD(p || q) = 1/2 D(p || m) + 1/2 D(q || m)\) where \(m = 1/2 (p + q)\) is the mixture distribution
The square root of JSD is a metric, Jensen-Shannon Distance.
- See Cross Entropy