Definition (Self-information)
Let E denote an event with probability p>0 of occurring. We call I(E) or I(p) as it is a function of p, the self-information of E and use it to represent the “amount of information” one gains about event E when learning that E has occurred.
Proposition (Properties of I(p))
- Certain events are not surprising: If some event E will most definitely happen, p(E)=1, then that event occurring should provide us with no surprise (or new information): H(E)=0.
- Impossible events are infinitely surprising: If some event E, has zero chance of occurring, p(E)=0, then we should be infinitely surprised that the event is occurring H(E)=∞.
- Non-Increasing: I(p) should be non-increasing in p (i.e. the less likely event E is, the more information one gains from it happening).
- Continuity: I(p) should be continuous in p. Intuitively, one would expect that a small change in p corresponds to a small change in the amount of information about E.
- Continuity of Independence: If E1 and E2 are independent with probabilities p1>0 and p2>0, respectively, then I(E1∩E2)=I(p1∗p2)=I(p1)+I(p2) This property is “reasonable” as E1 and E2 are independent.
Theorem (Representation of I(p))
The only function I(p), 0≤p≤1, satisfying properties 1-5 above is given by I(p)=−clogb(p) where c>0 and b>1 are constants (b is for base unit).
Proposition (b unit table)
b2e3qunits of I(p)bitsnatsternary unitsq-ary digits