FIND ME ON

GitHub

LinkedIn

Entropy

🌱

Definition
InfoTheory

Let XX be a discrete Random Variable with finite Alphabet X\mathscr{X} and pmf given by pX(a):=P(X=a),aXp_X(a):=P(X=a), a\in\mathscr{X} The entropy of a discrete Random Variable XX with pmf pXp_X and alphabet X\mathscr{X} is denoted by H(X)H(X) and given by: H(X)=aXpX(a)log2(pX(a))H(X)=-\sum_{a\in\mathscr{X}}p_X(a)\log_2(p_X(a)) ## Intuition Taking the notion of self-information, we can look at entropy as the average information conveyed by some random variable. We can look at entropy as giving us our expected information from a certain distribution rv Examining the definition of entropy H(X)H(X) we have that H(X)=aXpX(a)log2(pX(a))=aXpX(a) I(pX(a))=EpX[I(pX(x))]=EpX[log2(pX(x))] \begin{align*} H(X)&=-\sum_{a\in\mathscr{X}}p_X(a)\log_2(p_X(a))\\ &=\sum_{a\in\mathscr{X}}p_X(a)\ I(p_X(a))\\ &=E_{p_X}[I(p_X(x))]\\ &=E_{p_X}[-log_2(p_X(x))] \end{align*} or that H(X)H(X) is the expected (statistical average) amount of information one gains about the occurrence of one its X|\mathscr{X}| outcomes.

H(X)H(X) denotes entropy with a base unit bb of 2. Other values of bb will be denoted using a suffix like Hb(X)H_b(X).

1. H(X)0H(X)\ge0 with equality if and only if XX is deterministic (i.e. cX\exists c\in\mathscr{X} such that pX(c)=1p_X(c)=1) 2. Hb(X):=aX pX(a)logbpX(a)=(logb2)H(X)H_b(X):=-\sum_{a\in\mathscr{X}}\ p_X(a)*\log_bp_X(a)=(\log_b2)H(X)

If XpXX\sim p_X is a discrete RV (with finite alphabet X\mathscr{X}) then H(X)log2XH(X)\le\log_2|\mathscr{X}| with equality if and only if XX is uniformly distributed over X\mathscr{X}

Intuition

From the perspective of Information Theory, Entropy tells us what the most efficient encoding of our information is (i.e. your encoding solution’s average bit usage is lower bounded by the Entropy of the input).

You can also think of it as giving a benchmark for the most efficient way to encode information from a given distribution.

Intuitively, entropy or H(X)H(X), tells us how “random” XX is: 1. If XX is deterministic, then H(X)=0H(X)=0 1. This is the minimum value of entropy or randomness for XX 2. If XX is uniform, then H(X)=log2XH(X)=\log_2|\mathscr{X}| 1. This is the maximal value of entropy or randomness for XX

Given (X,Y)pXY(X,Y)\sim p_{XY}, the conditional entropy of XX given YY is denoted by H(XY)H(X|Y) and given by H(XY)=EpXY[log2(pXY(xy))]=aXbYpXY(a,b) log2(pXY(ab))\begin{align*}H(X|Y)&=E_{p_{XY}}[-\log_2(p_{X|Y}(x|y))]\\&=-\sum_{a\in\mathscr{X}}\sum_{b\in\mathscr{Y}}p_{XY}(a,b) \ \log_2(p_{X|Y}(a|b))\end{align*}

H(X,Y)=H(X)+H(YX)=H(Y)+H(XY)H(X,Y)=H(X)+H(Y|X)=H(Y)+H(X|Y)

If (X,Y)pXY(X,Y)\sim p_{XY}, then H(X)H(XY)H(X)\ge H(X|Y) and if X ⁣ ⁣ ⁣YX\perp\!\!\!\perp Y then H(X)=H(XY)H(X)=H(X|Y)

Linked from