Cross-Entropy

Definition (Cross entropy)

Given two pmfs pp and qq on the same alphabet X\mathscr{X}, the Cross-Entropy between pp and qq, denoted as H(p;q)H(p;q) is H(p;q):=aXp(a)log21q(a)H(p;q):=\sum_{a\in\mathscr{X}}p(a)\log_2\frac{1}{q(a)}

Remark

Cross-Entropy is a quantity intimately connected to divergence: D(pq)=aXp(a)log2(p(a)q(a))=aXp(a)log2p(a)+aXp(a)log21q(a)=H(X)+H(p;q) \begin{align*} D(p\|q)&=\sum_{a\in\mathscr{X}}p(a)\log_2\left(\frac{p(a)}{q(a)}\right)\\ &=\sum_{a\in\mathscr{X}}p(a)\log_2p(a)+\sum_{a\in\mathscr{X}}p(a)\log_2\frac{1}{q(a)}\\ &=-H(X)+H(p;q) \end{align*}

Intuition

Where divergence essentially tells us the difference in entropy between using pp to encode pp and using qq to encode pp. Cross-Entropy tells us the total number of expected bits (or entropy) needed to when symbols are generated from pp but qq is used to encode them.

Proposition (Entropy vs Cross-Entropy)

Since D(pq)0D(p\|q)\ge0, then by the above remark we have 0D(pq)=H(p)+H(p;q)0\le D(p\|q)=-H(p)+H(p;q) which implies D(pq)H(p;q)H(p)H(p;q) \begin{align*} D(p\|q)&\le H(p;q)\\ H(p)&\le H(p;q) \end{align*}

Linked from