LZ77 Code
Let xik=xixi+1…xk, xk=x1k=x1x2…xk
- Set the window length to w∈N.
- A string xn=x1x2…xn from the finite alphabet X is to be encoded.
- Assume that x1…xi−1 has already been encoded.
- We now find the largest k such that for some j in the range 1≤j≤w xi−ji−j+k−1=xii+k−1i.e. the longest match of a string of k not-yet-encoded symbols (phrase) with a string starting in the window (search buffer) consisting of the last w symbols xi−wi−1=xi−wxi−w+1…xi−1
- Represent the phrase xii+k−1 with the pair (j,k), i.e. the location where the match starts in the window and the length of the match
- If no match is found, send xi uncompressed.
- The encoded phrase is represented by the pointer (F,j,k) or (F,xi) where F=1 if a match is found and F=0 if there is no match. Think of j as the number of steps you need to go backwards to find the beginning of the matching phrase and k as the length of the phrase.
Lemma (Existence of LZ77 Codelengths)
There is a prefix code C:N→{0,1}∗ with codeword lengths satisfying ∣C(k)∣=logk+2loglogk+O(1)
Theorem (LZ77 is Optimal for Stationary Ergodic Sources)
Assume {Xi}=…,X−1,X0,X1,… is a Stationary & Ergodic Source having entropy rate H~(X). Then the avg code rate of the simplified version of the LZ77 algorithm satisfies n→∞limn1E[ln(Xn)]=n→∞limRˉn=H~(X) i.e. it is optimal.
LZ78 Code
This method builds an explicit dictionary by incrementally parsing the input sequence into shortest phrases that have not been seen so far. Then we represent each phrase in terms of others that have previously appeared in the form of a pair (i,s) where i is the index of referenced symbol and s and the new symbol added on to the preexisting one.
Example
Our phrase abbaaacbaacbaais parsed as a,b,ba,aa,c,bb,aac,baaand we represent it as (0,a),(0,b),(2,a),(1,a),(0,c),(2,b),(4,c),(3,a)
Properties
- Code Length: l(Xn)=c(xn)(logc(xn)+O(1))where c(xn) is the number of phrases in the dictionary obtained by parsing xn.
- Alternative statement for entropy rate n→∞limnE[c(Xn)(logc(Xn)+O(1))]=H~(X) >[!thm] LZ78 is a Universal Code >The LZ78 algorithm is universal in the class of all stationary and ergodic sources on alphabet X i.e. n→∞limnE[c(Xn)(logc(Xn)+O(1))]=n→∞limE[ln(Xn)]=H~(X) for any stationary ergodic source {Xi}i=1∞ with entropy rate H~(X).