Blackwell's Irrelevant Information Theorem

Theorem (5.1.1)

Let X,Y,U\mathbb{X},\mathbb{Y},\mathbb{U} be Polish spaces and let P\mathbb{P} be a probability measure on B(X×Y)\mathcal{B}(\mathbb{X}\times \mathbb{Y}), and let c:X×URc:\mathbb{X}\times \mathbb{U}\to \mathbb{R} be a bounded, Borel cost function. Then, for any Borel function γ:X×YU\gamma:\mathbb{X}\times \mathbb{Y}\to \mathbb{U}, there exists another Borel function γ:XU\gamma^{*}:\mathbb{X}\to \mathbb{U} s.t. Xc(x,γ(x))dPX(dx)X×Yc(x,γ(x,y))dP(dx,dy)\int\limits _{\mathbb{X}}c(x,\gamma^{*}(x)) \, d\mathbb{P}_{\mathbb{X}}(dx)\le \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, d\mathbb{P}(dx,dy) where PX\mathbb{P}_{\mathbb{X}} is the marginal of P\mathbb{P} on X\mathbb{X}. Thus, policies based only on xx are a.s. optimal.

\begin{proof} Goal: Construct a γ\gamma^{*} given γ\gamma Let u=γ(x,y)u=\gamma(x,y).

To emphasize the random nature of the realizations x,y,ux,y,u we denote their corresponding rvs as X,Y,UX,Y,U.

Given some Policy γ\gamma, we write for any Borel subset DUD\subset \mathbb{U} and xXx \in \mathbb{X}, the following Stochastic Kernel U1{uD}P(Udux)=Pγ(UDx)=P(γ(X,Y)DX=x)=Y1{γ(x,y)D}P(YdyX=x).\int\limits _{\mathbb{U}}\mathbb{1}_{\{ u\in D \}} \, \mathbb{P}(U\in du\mid x) =\mathbb{P}^{\gamma}(U\in D\mid x)=\mathbb{P}(\gamma(X,Y)\in D\mid X=x)=\int\limits _{\mathbb{Y}}\mathbb{1}_{\{ \gamma(x,y)\in D \}} \, \mathbb{P}(Y\in dy\mid X=x). We then have X×Yc(x,γ(x,y))P(dx,dy)=XYc(x,γ(x,y))P(dyx)P(dx)=XYc(x,γ(x,y))P(Ydyx)P(dx)=XUc(x,u)Pγ(Udux)P(dx)=X(Uc(x,u)Pγ(dux))P(dx)\begin{align*} \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dx,dy)&= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dy\mid x) \, \mathbb{P}(dx) \\ &= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{Y}} c(x,\gamma(x,y)) \, \mathbb{P}(Y\in dy\mid x)\, \mathbb{P}(dx)\\\\ &= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{U}}c(x,u) \,\mathbb{P}^{\gamma}(U\in du\mid x) \, \mathbb{P}(dx)\\ &= \int\limits _{\mathbb{X}}\left( \int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x) \right) \, \mathbb{P}(dx) \end{align*}We then denote the inner integral as follows: hγ(x):=Uc(x,u)Pγ(dux)(1)h^{\gamma}(x):=\int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x) \tag{1}to get Xhγ(x)P(dx).\int\limits _{\mathbb{X}}h^{\gamma}(x) \, \mathbb{P}(dx) .Now, consider the set D={(x,u)X×U:c(x,u)hγ(x)}D=\{ (x,u)\in \mathbb{X}\times \mathbb{U}:c(x,u)\le h^{\gamma}(x) \}and its X\mathbb{X}-sections Dx:={uU:(x,u)D}xX,D_{x}:=\{ u\in\mathbb{U}:(x,u)\in D \}\quad\forall x \in \mathbb{X},then Pγ(Dxx)>0,xX(2)\mathbb{P}^{\gamma}(D_{x}\mid x)>0,\quad\forall x \in \mathbb{X}\tag{2}since otherwise we would have Uc(x,u)Pγ(dux)=Dxcc(x,u)Pγ(dux)>hγ(x)DxcPγ(dux)=hγ(x)UPγ(dux)=hγ(x)\begin{align*} \int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x)&= \int\limits _{D_{x}^{c}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x)\\ &> h^{\gamma}(x)\int\limits _{D_{x}^{c}} \, \mathbb{P}^{\gamma}(du\mid x)\\ &= h^{\gamma}(x)\int\limits _{\mathbb{U}} \, \mathbb{P}^{\gamma}(du\mid x)\\ &= h^{\gamma}(x) \end{align*}which contradicts our definition in (1)(1). By some measurable selection theorem, (2)(2) implies that γ:XU\exists\gamma^{*}:\mathbb{X}\to \mathbb{U} such that {(x,γ(x)):xX}D\{ (x,\gamma^{*}(x)): x \in \mathbb{X} \}\subseteq Di.e. c(x,γ(x))hγ(x)c(x,\gamma^{*}(x))\le h^{\gamma}(x) for every xXx \in \mathbb{X}: Xc(x,γ(x))P(dx)Xhγ(x)P(dx)X×Yc(x,γ(x,y))P(dx,dy)\int\limits _{\mathbb{X}}c(x,\gamma^{*}(x)) \, \mathbb{P}(dx)\le \int\limits _{\mathbb{X}}h^{\gamma}(x) \, \mathbb{P}(dx)\equiv \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dx,dy)

\end{proof} >[!theorem] Ryll-Nardzewski Measurable Selection Theorem >Let X,Y\mathbb{X},\mathbb{Y} be standard Borel. Let A\mathscr{A} be a countably-generated sub-σ-algebra of the Borel σ-algebra of X\mathbb{X} and let B\mathscr{B} be the class of Borel subsets of Y\mathbb{Y}. For any function X×B\mathbb{X}\times \mathscr{B} s.t. >1. P(x)\mathbb{P}(x\mid\cdot) is for each xx a Probability Measure on B\mathscr{B} and >2. for each BBB\in\mathscr{B}, P(B)\mathbb{P}(\cdot\mid B) is an A\mathscr{A}-Measurable Function on X\mathbb{X}, and any set DA×BD\in\mathscr{A}\times \mathscr{B} s.t. P(x,Dx)>0,xX\mathbb{P}(x,D_{x})>0,\quad \forall x \in \mathbb{X}where Dx={y:(x,y)D}D_{x}=\{ y:(x,y)\in D \}. > >Then, there is a A\mathscr{A}-measurable function g:XYg:\mathbb{X}\to \mathbb{Y} whose ‘graph’ is a subset of DD (i.e. (x,g(x))D,xX(x,g(x))\in D,\forall x \in \mathbb{X}).

This is used to prove the following theorem.

Theorem (5.12)

Let {(xt,ut)}\{ (x_{t},u_{t}) \} be a controlled Markov chain. Consider the Finite Horizon Optimization problem: JN(X,γ)=Exγ[k=0N1c(Xk,Uk)+cN(XN)]J_{N}(X,\gamma)=E_{x}^{\gamma}\left[ \sum_{k=0}^{N-1}c(X_{k},U_{k})+c_{N}(X_{N}) \right]where we seek to minimize the cost over all admissible policies. Any such policy can be replaced with one which is Markov and which is at least as good as the original policy. i.e. there is no loss in restricting policies to be Markov.