Huffman Lemma

Lemma (Huffman)

Consider a source with alphabet X={a1,,aM}\mathcal{X}=\{a_{1},\cdots,a_{M}\} and symbol probabilities p1,,pMp_{1},\cdots,p_{M} such that p1pM>0p_1\ge\cdots\ge p_M>0Consider the reduced source with alphabet Y\mathcal{Y} obtained from X\mathcal{X} by combining the two least likely source symbols aM1a_{M-1} and aMa_{M} into an equivalent symbol aM1,Ma_{M-1,M} with probability pM1,Mp_{M-1,M}: Y={a1,,aM2,aM1,M}\mathcal{Y}=\{a_{1},\cdots,a_{M-2},a_{M-1,M}\}Let C2\mathcal{C}_{2}, given by f2:Y{0,1}f_{2}:\mathcal{Y}\to\{0,1\}^{*}, be an optimal prefix code for the reduced source Y\mathcal{Y}. Now, construct a code C1, f1:X{0,1}\mathcal{C}_{1}, \ f_{1}:\mathcal{X}\to\{0,1\}^{*}, for the original source X\mathcal{X} as follows:

  • The codewords for symbols a1,,aM2a_{1},\cdots,a_{M-2} are exactly the same as the corresponding codewords in C2\mathcal{C}_{2}:f1(ai)=f2(ai), i=1,,M2f_{1}(a_{i})=f_{2}(a_{i}), \ i=1,\cdots,M-2
  • The codewords for symbols aM1a_{M-1} and aMa_{M} are formed by appending a “0” and a “1”, respectively, to the codeword f2(aM1,M)C2f_{2}(a_{M-1,M})\in\mathcal{C}_{2} for symbol aM1,Ma_{M-1,M}:f1(aM1)=f2(aM1,M)0f1(aM)=f2(aM1,M)1\begin{align*} f_{1}(a_{M-1})&=f_{2}(a_{M-1,M})0\\ f_{1}(a_{M})&=f_{2}(a_{M-1,M})1\\ \end{align*}Then C1\mathcal{C}_{1} is an optimal prefix code for the original source X\mathcal{X}.