Definition (Mutual Information)
Given (X,Y)∼pXY on X×Y, the mutual information between X and Y, denoted by I(X;Y), is given by I(X;Y):=D(pXY∥pXpY)=EpXY[log2(pX(X)pY(Y)pXY(X,Y))]=a∈X∑b∈Y∑pXY(a,b)log2pX(a)pY(b)pXY(a,b)
Proposition (Properties of Mutual Information)
- Symmetry: I(X;Y)=I(Y;X)
- Chain Rule: I(X;Y)=H(X)−H(X∣Y)=H(Y)−H(Y∣X)=H(X)+H(Y)−H(X,Y)
- Mutual Information of the same variable is Entropy: I(X;X)=H(X)
- Nonnegativity: I(X;Y)≥0 (with equality iff X⊥⊥Y)
- LUB: I(X;Y)≤min{log2∣X∣,log2∣Y∣}
Definition (Conditional Mutual Information)
Let (X,Y,Z)∼pXYZ on X×Y×Z, the conditional mutual information between X and Y given Z is: I(X;Y∣Z):=D(pXY∣Z∥pX∣ZpY∣Z∣pZ)=a∈X∑b∈Y∑c∈Z∑pXY∣Z(a,b∣c)pZ(c)log2pX∣Z(a∣c) pY∣Z(b∣c)pXY∣Z(a,b∣c)=c∈Z∑pZ(c)a∈X∑b∈Y∑pXY∣Z(a,b∣c)log2pX∣Z(a∣c) pY∣Z(b∣c)pXY∣Z(a,b∣c)=Ez∼PZ[D(pXY∣z∥pX∣zpY∣z)]
Lemma (Conditional Analog Property)
I(X;Y∣Z)=H(X∣Z)−H(X∣Z,Y)=H(Y∣Z)−H(Y∣Z,X)=H(X∣Z)+H(Y∣Z)−H(X,Y∣Z)
Lemma (Chain Rule for Mutual Information)
Let random vector Xn and RV Y be jointly distributed with joint pmf pXnY then I(Xn;Y)=i=1∑nI(Xi;Y∣Xi−1)
Lemma (Mutual Information is Minimized by RD Function)
- Let X∼pX and RV Y∈X^ has p(y∣x) such that E[d(X,Y)]≤D, then I(X;Y)≥R(D)
- Let X∼pX and RV Y∈X^ then I(X;Y)≥R(E[d(X,Y)])
Lemma (Independence Bound of Mutual Information)
Let X1,…,Xn be iid RVs. Then for any RVs X^1,…,X^n I(Xn;X^n)≥i=1∑nI(Xi,X^i)
Definition (Differential mutual information)
Let (X,Y)∼fXY with SX⊂R2. Then the mutual information between X and Y is I(X;Y):=D(fX∥fy)=∫SXfXY(x,y)log2fX(x)fY(y)fXY(x,y)dxdy=h(X)+h(Y)−h(X,y)=h(X)−h(X∣Y)=h(Y)−h(Y∣X)
Conclusion
Mutual Information is a universal information measure in Information Theory.
Theorem (Chain Rule for differential mutual information)
I(X1,⋯.Xn;Y)=i=1∑nI(Xi;Y∣Xi−1,⋯,X1)