Given two pmfs p and q on the same alphabet X, the divergence between p and q, denoted by D(p∥q) is given by: D(p∥q)=a∈X∑p(a)log2(q(a)p(a))=Ep[log2(q(X)p(X))] where log2(q(X)p(X)) is called the “log-likelihood ratio”.
Convention
If X contains points of zero mass under p or q (i.e. is is not the support for both p and q), then: 0logq0=0 plog0p→+∞ for q≥0for p>0 >[!rmk] >Divergence is not a true distance as it does not satisfy symmetry or the triangular inequality.
Lemma (Non-negativity of divergence)
D(p∥q)≥0 with equality if and only if p=q
For p and q with support X, we have that the Divergence 2log21∥p−q∥2≤D(p∥q)≤(log2)mina∈X{p(a),q(a)}1∥p−q∥
Given two joint pmfs pXY=pxpY∣X and qXY=qXqY∣X on X×Y, then the divergence can be expressed as D(pXY∥qXY)=D(pX∥qX)+D(pY∣X∥qY∣X∣pX)
Let pX be a pmf on X and let pY∣X and qY∣X be two different conditional pmfs on Y×X. Then the conditional divergence between pY∣X and qY∣X given pX is D(pY∣X∥qY∣X∣pX):=EpY∣XpX[log2(qY∣X(Y∣X)pY∣X(Y∣X))]=a∈X∑b∈Y∑pX(a)pY∣X(b∣a)log2qY∣X(b∣a)pY∣X(b∣a)=a∈X∑pX(a)b∈Y∑pY∣X(b∣a)log2qY∣X(b∣a)pY∣X(b∣a)=EpX[D(pY∣X=X∥qY∣X=X)]
Let (X,X^,Z)∼PXX^Z on X×X×Z then D(pX∣Z∥pX^∣Z∣pZ)≥D(pX∥pX^)