FIND ME ON

GitHub

LinkedIn

Blackwell's Irrelevant Information Theorem

🌱

Theorem
StochasticControl

Let X,Y,U\mathbb{X},\mathbb{Y},\mathbb{U} be Polish spaces and let P\mathbb{P} be a probability measure on B(X×Y)\mathcal{B}(\mathbb{X}\times \mathbb{Y}), and let c:X×URc:\mathbb{X}\times \mathbb{U}\to \mathbb{R} be a bounded, Borel cost function. Then, for any Borel function γ:X×YU\gamma:\mathbb{X}\times \mathbb{Y}\to \mathbb{U}, there exists another Borel function γ:XU\gamma^{*}:\mathbb{X}\to \mathbb{U} s.t. Xc(x,γ(x))dPX(dx)X×Yc(x,γ(x,y))dP(dx,dy)\int\limits _{\mathbb{X}}c(x,\gamma^{*}(x)) \, d\mathbb{P}_{\mathbb{X}}(dx)\le \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, d\mathbb{P}(dx,dy) where PX\mathbb{P}_{\mathbb{X}} is the marginal of P\mathbb{P} on X\mathbb{X}. Thus, policies based only on xx are a.s. optimal.

\begin{proof} Goal: Construct a γ\gamma^{*} given γ\gamma Let u=γ(x,y)u=\gamma(x,y).

To emphasize the random nature of the realizations x,y,ux,y,u we denote their corresponding rvs as X,Y,UX,Y,U.

Given some policy γ\gamma, we write for any Borel subset DUD\subset \mathbb{U} and xXx \in \mathbb{X}, the following Stochastic Kernel U1{uD}P(Udux)=Pγ(UDx)=P(γ(X,Y)DX=x)=Y1{γ(x,y)D}P(YdyX=x).\int\limits _{\mathbb{U}}\mathbb{1}_{\{ u\in D \}} \, \mathbb{P}(U\in du\mid x) =\mathbb{P}^{\gamma}(U\in D\mid x)=\mathbb{P}(\gamma(X,Y)\in D\mid X=x)=\int\limits _{\mathbb{Y}}\mathbb{1}_{\{ \gamma(x,y)\in D \}} \, \mathbb{P}(Y\in dy\mid X=x). We then have X×Yc(x,γ(x,y))P(dx,dy)=XYc(x,γ(x,y))P(dyx)P(dx)=XYc(x,γ(x,y))P(Ydyx)P(dx)=XUc(x,u)Pγ(Udux)P(dx)=X(Uc(x,u)Pγ(dux))P(dx)\begin{align*} \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dx,dy)&= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dy\mid x) \, \mathbb{P}(dx) \\ &= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{Y}} c(x,\gamma(x,y)) \, \mathbb{P}(Y\in dy\mid x)\, \mathbb{P}(dx)\\\\ &= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{U}}c(x,u) \,\mathbb{P}^{\gamma}(U\in du\mid x) \, \mathbb{P}(dx)\\ &= \int\limits _{\mathbb{X}}\left( \int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x) \right) \, \mathbb{P}(dx) \end{align*}We then denote the inner integral as follows: hγ(x):=Uc(x,u)Pγ(dux)(1)h^{\gamma}(x):=\int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x) \tag{1}to get Xhγ(x)P(dx).\int\limits _{\mathbb{X}}h^{\gamma}(x) \, \mathbb{P}(dx) .Now, consider the set D={(x,u)X×U:c(x,u)hγ(x)}D=\{ (x,u)\in \mathbb{X}\times \mathbb{U}:c(x,u)\le h^{\gamma}(x) \}and its X\mathbb{X}-sections Dx:={uU:(x,u)D}xX,D_{x}:=\{ u\in\mathbb{U}:(x,u)\in D \}\quad\forall x \in \mathbb{X},then Pγ(Dxx)>0,xX(2)\mathbb{P}^{\gamma}(D_{x}\mid x)>0,\quad\forall x \in \mathbb{X}\tag{2}since otherwise we would have Uc(x,u)Pγ(dux)=Dxcc(x,u)Pγ(dux)>hγ(x)DxcPγ(dux)=hγ(x)UPγ(dux)=hγ(x)\begin{align*} \int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x)&= \int\limits _{D_{x}^{c}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x)\\ &> h^{\gamma}(x)\int\limits _{D_{x}^{c}} \, \mathbb{P}^{\gamma}(du\mid x)\\ &= h^{\gamma}(x)\int\limits _{\mathbb{U}} \, \mathbb{P}^{\gamma}(du\mid x)\\ &= h^{\gamma}(x) \end{align*}which contradicts our definition in (1)(1). By some measurable selection theorem, (2)(2) implies that γ:XU\exists\gamma^{*}:\mathbb{X}\to \mathbb{U} such that {(x,γ(x)):xX}D\{ (x,\gamma^{*}(x)): x \in \mathbb{X} \}\subseteq Di.e. c(x,γ(x))hγ(x)c(x,\gamma^{*}(x))\le h^{\gamma}(x) for every xXx \in \mathbb{X}: Xc(x,γ(x))P(dx)Xhγ(x)P(dx)X×Yc(x,γ(x,y))P(dx,dy)\int\limits _{\mathbb{X}}c(x,\gamma^{*}(x)) \, \mathbb{P}(dx)\le \int\limits _{\mathbb{X}}h^{\gamma}(x) \, \mathbb{P}(dx)\equiv \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dx,dy)

\end{proof} >[!theorem] Ryll-Nardzewski Measurable Selection Theorem >Let X,Y\mathbb{X},\mathbb{Y} be standard Borel. Let A\mathscr{A} be a countably-generated sub-σ-algebra of the Borel σ-algebra of X\mathbb{X} and let B\mathscr{B} be the class of Borel subsets of Y\mathbb{Y}. For any function X×B\mathbb{X}\times \mathscr{B} s.t. >1. P(x)\mathbb{P}(x\mid\cdot) is for each xx a probability measure on B\mathscr{B} and >2. for each BBB\in\mathscr{B}, P(B)\mathbb{P}(\cdot\mid B) is an A\mathscr{A}-measurable function on X\mathbb{X}, and any set DA×BD\in\mathscr{A}\times \mathscr{B} s.t. P(x,Dx)>0,xX\mathbb{P}(x,D_{x})>0,\quad \forall x \in \mathbb{X}where Dx={y:(x,y)D}D_{x}=\{ y:(x,y)\in D \}. > >Then, there is a A\mathscr{A}-measurable function g:XYg:\mathbb{X}\to \mathbb{Y} whose ‘graph’ is a subset of DD (i.e. (x,g(x))D,xX(x,g(x))\in D,\forall x \in \mathbb{X}).

Note

This is used to prove the Markov Policy is Good Enough theorem.