Cross Entropy
Average message length. https://youtu.be/ErfnhcEV1O8?t=300
\(H(p,q) = - \sum p log(q)\)
p : true distribution of events q : bits used to encode events (Or the predicted distribution)
\(H(p,q) = H(p) + D_{KL}(p || q)\)
i.e. cross entropy is greater than entropy of true distribution \(H(p)\), and that the difference is called Kullback-Leibler Divergence \(D_{KL}\).