Data Processing Inequality

Theorem (Data Processing Inequality)

If RVs, X,Y,ZX,Y,Z form a Markov chain, XYZX\to Y\to Z, then I(X;Y)I(X;Z)I(X;Y)\ge I(X;Z)

Remark

Holds for continuous random variables too

Remark

  1. Equality holds iff I(X;YZ)=0    XZYI(X;Y|Z)=0 \iff X\to Z\to Y
  2. Can be similarly shown that I(Y;Z)I(X;Z)I(Y;Z)\ge I(X;Z)
  3. Conditioning Reduces Mutual Information: I(X;Y)I(X;YZ)    XYZI(X;Y)\ge I(X;Y|Z) \iff X\to Y\to Z