Lemma (Huffman)
Consider a source with alphabet X={a1,⋯,aM} and symbol probabilities p1,⋯,pM such that p1≥⋯≥pM>0Consider the reduced source with alphabet Y obtained from X by combining the two least likely source symbols aM−1 and aM into an equivalent symbol aM−1,M with probability pM−1,M: Y={a1,⋯,aM−2,aM−1,M}Let C2, given by f2:Y→{0,1}∗, be an optimal prefix code for the reduced source Y. Now, construct a code C1, f1:X→{0,1}∗, for the original source X as follows:
- The codewords for symbols a1,⋯,aM−2 are exactly the same as the corresponding codewords in C2:f1(ai)=f2(ai), i=1,⋯,M−2
- The codewords for symbols aM−1 and aM are formed by appending a “0” and a “1”, respectively, to the codeword f2(aM−1,M)∈C2 for symbol aM−1,M:f1(aM−1)f1(aM)=f2(aM−1,M)0=f2(aM−1,M)1Then C1 is an optimal prefix code for the original source X.