Information Theory
Table of Contents
Information theory was originally developed to measure expected length of messages for optimal code in communication. This deals with discrete distributions. Shannon entropy assigns amount of uncertainity to an probability distribution.
We can, and do, apply similar formulas for continuous distribution but the interpretations don't remain same, and some properties are lost. This is called Differential Entropy. E.g. an event with probability = 1 has zero information because it is guranteed to occur, and similarly an event with density = 1 has zero information although it is not guranteed to occur.
Misc:
- With perfect coding, the encoded message must be indistinguishable from random.
- Links and knots are 1-dimensional manifolds, but we need 4 dimensions to be able to untangle all of them. Similarly, one can need yet higher dimensional space to be able to unknot n-dimensional manifolds. All n-dimensional manifolds can be untangled in 2n+2 dimensions. [http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/]
1. Mutual Information
Difference between the entropy of a variable and its conditional entropy:
\begin{align*} I[X;Y] = H[X] - H[X|Y] \end{align*}Conceptually, it gives the average reduction in uncertainity about one variable when we know the value of another variable.
This quantity is summetric, i.e. \(I[X;Y] = I[Y;X]\)