Introduction
So far, we’ve dealt primarily with discrete-time systems with discrete alphabets. Now we look to change discrete to continuous.
Let X⊂R be a real-valued RV with cdf FX(x)=P(X≤x), x∈R and pdf, fX where FX(x)=∫−∞xfX(t)dt, x∈RThe support of X with pdf fX is SX={x∈R:fX(x)>0} ## Some intuition Recall that for discrete (finite) alphabet random variables, its entropy has an operational meaning, i.e., it describes the minimal number of codebits per source symbol needed to losslessly describe it. But with a real-valued RV taking values in a continuum, the number of bits would be infinite since we have infinite source symbols.
The differential entropy of a real-valued RV X with pdf fX and support SX⊂R is given by h(X):=−∫SXfX(t)log2fX(t)dt=EX[−log2fX(X)] when the integral exists. The usual multivariate extension holds, i.e. for real-valued Xn=(X1,⋯,Xn) with pdf fXn and support SXn⊂Rn the joint differential entropy is h(Xn)=−∫SXnfXn(x1,⋯,xn)log2fXn(x1,⋯,xn)dx1⋯dxn=EXn[−log2fXn(Xn)](bits)when integral exists.
Intuition
So the operational meaning of the differential entropy is that since H([X]n) is the minimum average number of bits needed to losslessly describe [X]n, the uniformly quantized X. We then obtain that h(X)+n bits is approximately needed to describe X when uniformly quantizing it with n-bit accuracy or H([X]n)≈n+h(X)for n sufficiently large. So the larger h(X) is, the larger the average number of bits required to describe a uniformly quantized X with n-bit accuracy. If (X,Y)∼fXY on SXY⊂R2 then H([X]n,[Y]m)≈m+n+h(X,Y)for m,n sufficiently large.
1. Conditioning Reduces Entropy: h(X)≥h(X∣Y)with equality if X⊥⊥Y. 2. Chain Rule for Differential Entropy: h(Xn)=h(X1,⋯,Xn)=i=1∑nh(Xi∣Xi−1,⋯,X1) 3. Independence Bound for Differential Entropy: h(Xn)≤i=1∑nh(Xi)with equality ⟺ all Xi’s are independent. 4. Invariance of Differential Entropy Under Translation: h(X+c)=h(X) ∀\mboxconstantc∈R 5. Differential Entropy Under Scaling: h(aX)=h(X)+log∣a∣, ∀\mboxconstanta=0 6. Joint Differential Entropy Under Linear Maps: Let X=(X1,⋯,Xn)T be a random column vector with joint pdf fX=fXn and let Y=AXwhere A is an n×n invertible matrix (i.e. det(A)=0). Then h(Y)=h(Y1,⋯,Yn)=h(X)+log∣det(A)∣
Let (X,Y)∼fXY with support SXY⊂R2. The conditional differential entropy of Y given X is given by h(Y∣X):=−∫SXYfXY(x,y)log2fY∣X(y∣x)dxdy=EXY[−log2fY∣X(Y∣X)]
Similar to the discrete case, h(X,Y)=h(X)+h(Y∣X)=h(Y)+h(X∣Y)
Let X∼f and finite differential entropy h(X). Then its rate distortion function for MSE distortion is lower bounded as R(D)≥h(X)−21log(2πeD)=RSLB(D)or R(D)≥RG(D)−D(X∥XG)where the divergence measures the “non-Gaussianness” of X and for the distortion rate function D(R)≥2πe12−2(R−h(X))
For any real-valued n×n positive definite matrix K=[kij] we have det(K)≤i=1∏nKiiwith equality iff K is a diagonal matrix.
Let X=(X1,…,Xn)T be a real-valued random vector with support SXn=Rn, mean vector μ and (invertible) covariance matrix KX. Then h(X)=h(X1,…,Xn)≤21log2[(2πe)ndet(KX)]with equality iff X∼N(μ,KX) i.e. X is Gaussian.
The following results regarding the maximal possible value of h(X) for a continuous RV X under various constraints can be shown: 1. If SX=(0,∞) and E[X]=μ<∞, then h(X)≤log2λe(bits)with equality iff X is exponential with parameter λ where λ=μ1. 2. If SX=R and E[∣X∣]=λ<∞, then h(X)≤log2(2eλ)(bits)with equality iff X is Laplacian with mean zero and variance 2λ2 3. If SX=(a,b), then h(X)≤log2(b−a)(bits)with equality iff X is uniform over (a,b).