FIND ME ON

GitHub

LinkedIn

Mutual Information

🌱

Definition
InfoTheory

Given (X,Y)∼pXY(X,Y)\sim p_{XY} on XƗY\mathscr{X}\times\mathscr{Y}, the mutual information between XX and YY, denoted by I(X;Y)I(X;Y), is given by I(X;Y):=D(pXY∄pXpY)=EpXY[log⁔2(pXY(X,Y)pX(X)pY(Y))]=āˆ‘a∈Xāˆ‘b∈YpXY(a,b)log⁔2pXY(a,b)pX(a)pY(b) \begin{align*} I(X;Y):&=D(p_{XY}\|p_Xp_Y)\\ &=E_{p_{XY}}\left[\log_2\left(\frac{p_{XY}(X,Y)}{p_X(X)p_Y(Y)}\right)\right]\\ &=\sum_{a\in\mathscr{X}}\sum_{b\in\mathscr{Y}}p_{XY}(a,b)\log_2\frac{p_{XY}(a,b)}{p_X(a)p_Y(b)} \end{align*}

1. Symmetry: I(X;Y)=I(Y;X)I(X;Y)=I(Y;X) 2. Chain Rule: I(X;Y)=H(X)āˆ’H(X∣Y)=H(Y)āˆ’H(Y∣X)=H(X)+H(Y)āˆ’H(X,Y)\begin{align*}I(X;Y)&=H(X)-H(X|Y)\\&=H(Y)-H(Y|X)\\&=H(X)+H(Y)-H(X,Y)\end{align*} 3. Mutual Information of the same variable is Entropy: I(X;X)=H(X)I(X;X)=H(X) 4. Nonnegativity: I(X;Y)≄0I(X;Y)\ge0 (with equality iff XāŠ„ā€‰ā£ā€‰ā£ā€‰ā£āŠ„YX\perp\!\!\!\perp Y) 5. LUB: I(X;Y)≤min⁔{log⁔2∣X∣,log⁔2∣Y∣}I(X;Y)\le \min\{\log_2|\mathscr{X}|,\log_2|\mathscr{Y}|\}

I(X;Y)I(X;Y) measures the ā€œreductionā€ in the average amount of information (or uncertainty) about XX by knowing YY (and vice-versa). It is also a measure of dependence between XX and YY or a measure of ā€œinformation transferā€ from YY to XX (and vice-versa).

Let (X,Y,Z)∼pXYZ(X,Y,Z) \sim p_{XYZ} on XƗYƗZ\mathscr{X}\times\mathscr{Y}\times\mathscr{Z}, the conditional mutual information between XX and YY given ZZ is: I(X;Y∣Z):=D(pXY∣Z∄pX∣ZpY∣Z∣pZ)=āˆ‘a∈Xāˆ‘b∈Yāˆ‘c∈ZpXY∣Z(a,b∣c)pZ(c)log⁔2pXY∣Z(a,b∣c)pX∣Z(a∣c)Ā pY∣Z(b∣c)=āˆ‘c∈ZpZ(c)āˆ‘a∈Xāˆ‘b∈YpXY∣Z(a,b∣c)log⁔2pXY∣Z(a,b∣c)pX∣Z(a∣c)Ā pY∣Z(b∣c)=Ez∼PZ[D(pXY∣z∄pX∣zpY∣z)] \begin{align*} I(X;Y|Z):&=D(p_{XY|Z}\|p_{X|Z}p_{Y|Z}|p_Z)\\ &=\sum_{a\in\mathscr{X}}\sum_{b\in\mathscr{Y}}\sum_{c\in\mathscr{Z}}p_{XY|Z}(a,b|c)p_Z(c)\log_2\frac{p_{XY|Z}(a,b|c)}{p_{X|Z}(a|c) \ p_{Y|Z}(b|c)}\\ &=\sum_{c\in\mathscr{Z}}p_Z(c)\sum_{a\in\mathscr{X}}\sum_{b\in\mathscr{Y}}p_{XY|Z}(a,b|c)\log_2\frac{p_{XY|Z}(a,b|c)}{p_{X|Z}(a|c) \ p_{Y|Z}(b|c)}\\ &= \mathbb{E}_{z\sim P_{Z}}[D(p_{XY|z}\Vert p_{X|z}p_{Y|z})] \end{align*}

I(X;Y∣Z)=H(X∣Z)āˆ’H(X∣Z,Y)=H(Y∣Z)āˆ’H(Y∣Z,X)=H(X∣Z)+H(Y∣Z)āˆ’H(X,Y∣Z)\begin{align*}I(X;Y|Z)&=H(X|Z)-H(X|Z,Y)\\&=H(Y|Z)-H(Y|Z,X)\\&=H(X|Z)+H(Y|Z)-H(X,Y|Z)\end{align*}

Let random vector XnX^n and RV YY be jointly distributed with joint pmf pXnYp_{X^nY} then I(Xn;Y)=āˆ‘i=1nI(Xi;Y∣Xiāˆ’1)I(X^n;Y)=\sum_{i=1}^nI(X_i;Y|X^{i-1})

1. Let X∼pXX\sim p_{X} and RV Y∈X^Y\in\hat{\mathcal{X}} has p(y∣x)p(y|x) such that E[d(X,Y)]≤DE[d(X,Y)]\le D, then I(X;Y)≄R(D)I(X;Y)\ge R(D) 2. Let X∼pXX\sim p_{X} and RV Y∈X^Y\in\hat{\mathcal{X}} then I(X;Y)≄R(E[d(X,Y)])I(X;Y)\ge R(E[d(X,Y)])

Let X1,…,XnX_{1},\dots,X_{n} be iid RVs. Then for any RVs X^1,…,X^n\hat{X}_{1},\dots,\hat{X}_{n} I(Xn;X^n)ā‰„āˆ‘i=1nI(Xi,X^i)I(X^{n};\hat{X}_{n})\ge\sum_{i=1}^{n}I(X_{i},\hat{X}_{i})

Let (X,Y)∼fXY(X,Y)\sim f_{XY} with SXāŠ‚R2S_{X}\subset\mathbb{R}^{2}. Then the mutual information between XX and YY is I(X;Y):=D(fX∄fy)=∫SXfXY(x,y)log⁔2fXY(x,y)fX(x)fY(y)dxdy=h(X)+h(Y)āˆ’h(X,y)=h(X)āˆ’h(X∣Y)=h(Y)āˆ’h(Y∣X)\begin{align*} I(X;Y):&=D(f_{X}\|f_{y})\\ &=\int_{S_{X}}f_{XY}(x,y)\log_2\frac{f_{XY}(x,y)}{f_{X}(x)f_{Y}(y)}dxdy\\ &=h(X)+h(Y)-h(X,y)\\ &=h(X)-h(X|Y)\\ &=h(Y)-h(Y|X) \end{align*}

For continuous RVs (X,Y)∼fXY(X,Y)\sim f_{XY}, for nn and mm sufficiently large, I([X]n,[Y]n)=H([X]n)+H([Y]n)āˆ’H([X]n,[Y]n)ā‰ˆ(h(X)+n)+(h(Y)+m)āˆ’(h(X,Y)+n+m)=h(X)+h(Y)āˆ’h(X,Y)=I(X;Y)\begin{align*} I([X]_{n},[Y]_{n})&=H([X]_{n})+H([Y]_{n})-H([X]_{n},[Y]_{n})\\ &\approx (h(X)+n) + (h(Y)+m) - (h(X,Y)+n+m)\\ &=h(X)+h(Y)-h(X,Y)\\ &=I(X;Y) \end{align*}Hence lim⁔n,mā†’āˆžI([X]n,[Y]n)=I(X;Y)\lim_{n,m\to\infty}I([X]_{n},[Y]_{n})=I(X;Y) # Conclusion Mutual Information is a universal information measure in Information Theory.

I(X1,⋯ .Xn;Y)=āˆ‘i=1nI(Xi;Y∣Xiāˆ’1,⋯ ,X1)I(X_{1},\cdots.X_{n};Y)=\sum\limits_{i=1}^{n}I(X_{i};Y|X_{i-1},\cdots,X_{1})

Linked from