Prefix Code

NAVIGATION

Home

Research

Bookshelf

Garden

FIND ME ON

GitHub

Home

Research

Bookshelf

Garden

🌱

InfoTheory

A prefix code (or prefix-free or instantaneous code) is a VLC for which none of its codewords is a prefix of another codeword.

Note

A prefix code is UD since none of its codewords can be at the beginning of another codeword: as soon as we look at a sequence of codewords, we can directly delineate the codewords (from left to right) and separate them (via commas) as they are immediately recognizable.

1. Every $D$ -ary n-th order prefix VLC for a discrete source $\{X_{i}\}_{i=1}^\infty$ of alphabet $\mathcal{X}$ has $M=|\mathcal{X}|^{M}$ codeword lengths $\mathscr{l}_{1},\cdots,\mathscr{l}_{M}$ satisfying the Kraft Inequality of base $D$ 2. Conversely, given a set $\{\mathscr{l}_{1},\cdots,\mathscr{l}_{M}\}$ of $M=|\mathcal{X}|^{M}$ positive integers that satisfy the Kraft Inequality of base $D$ , there exists a $D$ -ary n-th order prefix VLC for the source with codeword lengths $\mathscr{l}_{1},\cdots,\mathscr{l}_{M}$ .

Let $\mathcal{C}$ be an optimal $n$ -th order code for the source with the class of prefix codes (i.e. $\overline{\mathscr{l}_{n}}(\mathcal{C}) \le \overline{\mathscr{l}_{n}}(\mathcal{C}_{p})$ for all prefix codes $\mathcal{C}_{p}$ ). Then $\mathcal{C}$ is also optimal within the entire class of UD codes (i.e. $\overline{\mathscr{l}_{n}}(\mathcal{C})\le\overline{\mathscr{l}_{n}}(\hat{\mathcal{C}})$ for all UD codes $\hat{\mathcal{C}}$ ).

Let $\mathcal{C}$ be an optimal binary prefix code with codeword lengths $\mathscr{l}_{i}, \ i=1,\cdots,M$ for a source $\{X_{i}\}$ with alphabet $\mathcal{X}=\{a_{1},\cdots,a_M\}$ and symbol probabilities $p_{1},\cdots,p_{M}$ ( $M=|\mathcal{X}|$ ). Without loss of generality, assume $p_{1}\ge\cdots\ge p_{M}$ and that any group of source symbols with the same probability is arranged in the order of increasing codeword lengths (i.e., if $p_{i}=p_{i+1}=\cdots=p_{i+s}$ ) then $\mathscr{l}_{i}\le\cdots\le\mathscr{l}_{i+s}$ ). Then the following properties hold: 1. Higher probability source symbols have shorter codewords: $p_{j}>p_k\implies\mathscr{l}_{j}\le\mathscr{l}_{k}$ 2. The two least probable source symbols have codewords of equal lengths: $\mathscr{l}_{M-1}=\mathscr{l}_{M}$ 3. Among the codewords of length $\mathscr{l}_{M}$ , two of them are identical except in the last digit.

Linked from

Huffman Lemma

Shannon Coding Theorem

Lempel-Ziv Coding

Prefix Code

Shannon-Fano-Elias Code

Prefix Code Redundancy

Universal Code