Prefix Code

NAVIGATION

Home

Research

Bookshelf

Garden

FIND ME ON

GitHub

Home

Research

Bookshelf

Garden

ℹ

Definition (Prefix Code)

A prefix code (or prefix-free or instantaneous code) is a VLC for which none of its codewords is a prefix of another codeword.

Note

A prefix code is UD since none of its codewords can be at the beginning of another codeword: as soon as we look at a sequence of codewords, we can directly delineate the codewords (from left to right) and separate them (via commas) as they are immediately recognizable.

Theorem (Kraft Inequality for Prefix Codes)

Every $D$ -ary n-th order prefix VLC for a discrete source $\{X_{i}\}_{i=1}^\infty$ of alphabet $\mathcal{X}$ has $M=|\mathcal{X}|^{M}$ codeword lengths $\mathscr{l}_{1},\cdots,\mathscr{l}_{M}$ satisfying the Kraft Inequality of base $D$
Conversely, given a set $\{\mathscr{l}_{1},\cdots,\mathscr{l}_{M}\}$ of $M=|\mathcal{X}|^{M}$ positive integers that satisfy the Kraft Inequality of base $D$ , there exists a $D$ -ary n-th order prefix VLC for the source with codeword lengths $\mathscr{l}_{1},\cdots,\mathscr{l}_{M}$ .

Lemma ((Any UD code can be replaced with a Prefix Code))

Let $\mathcal{C}$ be an optimal $n$ -th order code for the source with the class of prefix codes (i.e. $\overline{\mathscr{l}_{n}}(\mathcal{C}) \le \overline{\mathscr{l}_{n}}(\mathcal{C}_{p})$ for all prefix codes $\mathcal{C}_{p}$ ). Then $\mathcal{C}$ is also optimal within the entire class of UD codes (i.e. $\overline{\mathscr{l}_{n}}(\mathcal{C})\le\overline{\mathscr{l}_{n}}(\hat{\mathcal{C}})$ for all UD codes $\hat{\mathcal{C}}$ ).

Theorem (Necessary Conditions for Optimal Binary Prefix Code)

Let $\mathcal{C}$ be an optimal binary prefix code with codeword lengths $\mathscr{l}_{i}, \ i=1,\cdots,M$ for a source $\{X_{i}\}$ with alphabet $\mathcal{X}=\{a_{1},\cdots,a_M\}$ and symbol probabilities $p_{1},\cdots,p_{M}$ ( $M=|\mathcal{X}|$ ). Without loss of generality, assume $p_{1}\ge\cdots\ge p_{M}$ and that any group of source symbols with the same probability is arranged in the order of increasing codeword lengths (i.e., if $p_{i}=p_{i+1}=\cdots=p_{i+s}$ ) then $\mathscr{l}_{i}\le\cdots\le\mathscr{l}_{i+s}$ ). Then the following properties hold:

Higher probability source symbols have shorter codewords: $p_{j}>p_k\implies\mathscr{l}_{j}\le\mathscr{l}_{k}$
The two least probable source symbols have codewords of equal lengths: $\mathscr{l}_{M-1}=\mathscr{l}_{M}$
Among the codewords of length $\mathscr{l}_{M}$ , two of them are identical except in the last digit.

Definition (Prefix Code Redundancy)

The redundancy of a prefix $n$ -code $\mathcal{C}_{n}:\mathcal{X}\to \{ 0,1 \}^{*}$ w.r.t. the source distribution $p\in\mathcal{P}$ is $\begin{align*} R(\mathcal{C}_{n},p)&= \bar{R}_{n}(\mathscr{l}(\mathcal{C}_{x^{n}}))- \frac{1}{n}H_{p}(X^{n})\\ &=\frac{1}{n}[E_{p}[\mathscr{l}(\mathcal{C}_{x^{n}})]-H_{p}(X^{n})] \end{align*}$

Linked from

Huffman Lemma

Shannon Coding Theorem

Lempel-Ziv Coding

Shannon-Fano-Elias Code

Divergence Bound on KT

Penalty Lemma

Prefix Code

Universal Code