FIND ME ON

GitHub

LinkedIn

Prefix Code

🌱

InfoTheory

A prefix code (or prefix-free or instantaneous code) is a VLC for which none of its codewords is a prefix of another codeword.

Note

A prefix code is UD since none of its codewords can be at the beginning of another codeword: as soon as we look at a sequence of codewords, we can directly delineate the codewords (from left to right) and separate them (via commas) as they are immediately recognizable.

1. Every DD-ary n-th order prefix VLC for a discrete source {Xi}i=1\{X_{i}\}_{i=1}^\infty of alphabet X\mathcal{X} has M=XMM=|\mathcal{X}|^{M} codeword lengths l1,,lM\mathscr{l}_{1},\cdots,\mathscr{l}_{M} satisfying the Kraft Inequality of base DD 2. Conversely, given a set {l1,,lM}\{\mathscr{l}_{1},\cdots,\mathscr{l}_{M}\} of M=XMM=|\mathcal{X}|^{M} positive integers that satisfy the Kraft Inequality of base DD, there exists a DD-ary n-th order prefix VLC for the source with codeword lengths l1,,lM\mathscr{l}_{1},\cdots,\mathscr{l}_{M}.

Let C\mathcal{C} be an optimal nn-th order code for the source with the class of prefix codes (i.e.  ln(C)ln(Cp)\overline{\mathscr{l}_{n}}(\mathcal{C}) \le \overline{\mathscr{l}_{n}}(\mathcal{C}_{p}) for all prefix codes Cp\mathcal{C}_{p}). Then C\mathcal{C} is also optimal within the entire class of UD codes (i.e. ln(C)ln(C^)\overline{\mathscr{l}_{n}}(\mathcal{C})\le\overline{\mathscr{l}_{n}}(\hat{\mathcal{C}}) for all UD codes C^\hat{\mathcal{C}}).

Let C\mathcal{C} be an optimal binary prefix code with codeword lengths li, i=1,,M\mathscr{l}_{i}, \ i=1,\cdots,M for a source {Xi}\{X_{i}\} with alphabet X={a1,,aM}\mathcal{X}=\{a_{1},\cdots,a_M\} and symbol probabilities p1,,pMp_{1},\cdots,p_{M} (M=XM=|\mathcal{X}|). Without loss of generality, assume p1pMp_{1}\ge\cdots\ge p_{M} and that any group of source symbols with the same probability is arranged in the order of increasing codeword lengths (i.e., if pi=pi+1==pi+sp_{i}=p_{i+1}=\cdots=p_{i+s}) then lili+s\mathscr{l}_{i}\le\cdots\le\mathscr{l}_{i+s}). Then the following properties hold: 1. Higher probability source symbols have shorter codewords: pj>pk    ljlkp_{j}>p_k\implies\mathscr{l}_{j}\le\mathscr{l}_{k} 2. The two least probable source symbols have codewords of equal lengths: lM1=lM\mathscr{l}_{M-1}=\mathscr{l}_{M} 3. Among the codewords of length lM\mathscr{l}_{M}, two of them are identical except in the last digit.

Linked from