Divergence Bound on KT

Lemma (Dirichlet Mixture Distribution)

The coding distribution corresponding to the Dirichlet (α1,,αm)(\alpha_{1},\dots,\alpha_{m}) pdf is given by q(xn)=Θpθ(xn)fα1,,αm(θ)dθ=i=1mj=1n(ixn)(n(ixn)+αij)j=1n(n+i=1mαij)q(x^{n})=\int\limits _{\Theta}p_{\theta}(x^{n})f_{\alpha_{1},\dots,\alpha_{m}}(\theta) \, d\theta = \frac{\prod_{i=1}^{m}\prod_{j=1}^{n(i|x^{n})}(n(i|x^n)+\alpha_{i}-j)}{\prod_{j=1}^{n}\left( n+\sum_{i=1}^{m}\alpha_{i}-j \right)} where j=1n(ixn)(n(ixn)+αij)=1\prod_{j=1}^{n(i|x^{n})}(n(i|x^{n})+\alpha_{i}-j)=1 if n(ixn)=0n(i|x^{n})=0.

We consider two special cases, one where αi=1\alpha_{i}=1 and the other when αi=12\alpha_{i}=\frac{1}{2}.

Lemma (Krichevsky and Trofimov Coding Distribution)

For all 1tn1\le t\le n define q(ixt1)=n(ixt1)+12t1+m2q(i|x^{t-1})= \frac{n(i|x^{t-1})+\frac{1}{2}}{t-1+\frac{m}{2}}Then

  1. q(ixt1)q(i|x^{t-1}) is a conditional pmf on X\mathcal{X} given xt1x^{t-1}
  2. t=1nq(xtxt1)=i=1mj=1n(ixn)(n(ixn)+12j)j=1n(n+m2j)=q(xn)\prod_{t=1}^{n}q(x_{t}|x^{t-1})= \frac{\prod_{i=1}^{m}\prod_{j=1}^{n(i|x^{n})}\left( n(i|x^{n})+ \frac{1}{2} -j \right)}{\prod_{j=1}^{n}\left( n+\frac{m}{2}- j \right)}=q(x^{n})

Theorem

Let qnq^{n} be the on Xn\mathcal{X}^{n}, where X=m|\mathcal{X}|=m. Then for any pmf p(x)p(x) on X\mathcal{X} and iid source distribution pn(xn)=i=1np(xi)p^{n}(x^{n})=\prod_{i=1}^{n}p(x_{i}) 1nD(pnqn)m1logn2n+K0n,  n1\frac{1}{n}D(p^{n}\|q^{n})\le \frac{m-1\log n}{2n}+\frac{K_{0}}{n}, \ \ \forall n\ge1for some constant K0K_{0}.

Cor (Max Redundancy on KT)

Let P\mathcal{P} denote the class of iid sources on X\mathcal{X}. If Cn\mathcal{C}_{n} is the arithmetic code for on Xn\mathcal{X}^{n}, then its redundancy satisfies: maxpPR(Cn,p)m1logn2n+O(1n)\max_{p\in\mathcal{P}}R(\mathcal{C}_{n},p)\le \frac{m-1\log n}{2n}+O\left( \frac{1}{n} \right)This implies that KT coding distribution is minimax optimal.