Definition (Prefix Code)
A prefix code (or prefix-free or instantaneous code) is a VLC for which none of its codewords is a prefix of another codeword.
Note
A prefix code is UD since none of its codewords can be at the beginning of another codeword: as soon as we look at a sequence of codewords, we can directly delineate the codewords (from left to right) and separate them (via commas) as they are immediately recognizable.
Theorem (Kraft Inequality for Prefix Codes)
- Every D-ary n-th order prefix VLC for a discrete source {Xi}i=1∞ of alphabet X has M=∣X∣M codeword lengths l1,⋯,lM satisfying the Kraft Inequality of base D
- Conversely, given a set {l1,⋯,lM} of M=∣X∣M positive integers that satisfy the Kraft Inequality of base D, there exists a D-ary n-th order prefix VLC for the source with codeword lengths l1,⋯,lM.
Lemma ((Any UD code can be replaced with a Prefix Code))
Let C be an optimal n-th order code for the source with the class of prefix codes (i.e. ln(C)≤ln(Cp) for all prefix codes Cp). Then C is also optimal within the entire class of UD codes (i.e. ln(C)≤ln(C^) for all UD codes C^).
Theorem (Necessary Conditions for Optimal Binary Prefix Code)
Let C be an optimal binary prefix code with codeword lengths li, i=1,⋯,M for a source {Xi} with alphabet X={a1,⋯,aM} and symbol probabilities p1,⋯,pM (M=∣X∣). Without loss of generality, assume p1≥⋯≥pM and that any group of source symbols with the same probability is arranged in the order of increasing codeword lengths (i.e., if pi=pi+1=⋯=pi+s) then li≤⋯≤li+s). Then the following properties hold:
- Higher probability source symbols have shorter codewords: pj>pk⟹lj≤lk
- The two least probable source symbols have codewords of equal lengths: lM−1=lM
- Among the codewords of length lM, two of them are identical except in the last digit.
Definition (Prefix Code Redundancy)
The redundancy of a prefix n-code Cn:X→{0,1}∗ w.r.t. the source distribution p∈P is R(Cn,p)=Rˉn(l(Cxn))−n1Hp(Xn)=n1[Ep[l(Cxn)]−Hp(Xn)]