FIND ME ON

GitHub

LinkedIn

Cross-Entropy

🌱

Definition
InfoTheory

Given two pmfs pp and qq on the same alphabet X\mathscr{X}, the Cross-Entropy between pp and qq, denoted as H(p;q)H(p;q) is H(p;q):=āˆ‘a∈Xp(a)log⁔21q(a)H(p;q):=\sum_{a\in\mathscr{X}}p(a)\log_2\frac{1}{q(a)}

Cross-Entropy is a quantity intimately connected to divergence: D(p∄q)=āˆ‘a∈Xp(a)log⁔2(p(a)q(a))=āˆ‘a∈Xp(a)log⁔2p(a)+āˆ‘a∈Xp(a)log⁔21q(a)=āˆ’H(X)+H(p;q) \begin{align*} D(p\|q)&=\sum_{a\in\mathscr{X}}p(a)\log_2\left(\frac{p(a)}{q(a)}\right)\\ &=\sum_{a\in\mathscr{X}}p(a)\log_2p(a)+\sum_{a\in\mathscr{X}}p(a)\log_2\frac{1}{q(a)}\\ &=-H(X)+H(p;q) \end{align*} # Intuition Where divergence essentially tells us the difference in entropy between using pp to encode pp and using qq to encode pp. Cross-Entropy tells us the total number of expected bits (or entropy) needed to when symbols are generated from pp but qq is used to encode them.

Since D(p∄q)≄0D(p\|q)\ge0, then by the above remark we have 0≤D(p∄q)=āˆ’H(p)+H(p;q)0\le D(p\|q)=-H(p)+H(p;q) which implies D(p∄q)≤H(p;q)H(p)≤H(p;q) \begin{align*} D(p\|q)&\le H(p;q)\\ H(p)&\le H(p;q) \end{align*}

Linked from