Blackwell's Irrelevant Information Theorem

NAVIGATION

Home

Research

Bookshelf

Garden

FIND ME ON

GitHub

Home

Research

Bookshelf

Garden

Blackwell's Irrelevant Information Theorem

Theorem (5.1.1)

Let $\mathbb{X},\mathbb{Y},\mathbb{U}$ be Polish spaces and let $\mathbb{P}$ be a probability measure on $\mathcal{B}(\mathbb{X}\times \mathbb{Y})$ , and let $c:\mathbb{X}\times \mathbb{U}\to \mathbb{R}$ be a bounded, Borel cost function. Then, for any Borel function $\gamma:\mathbb{X}\times \mathbb{Y}\to \mathbb{U}$ , there exists another Borel function $\gamma^{*}:\mathbb{X}\to \mathbb{U}$ s.t. $\int\limits _{\mathbb{X}}c(x,\gamma^{*}(x)) \, d\mathbb{P}_{\mathbb{X}}(dx)\le \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, d\mathbb{P}(dx,dy)$ where $\mathbb{P}_{\mathbb{X}}$ is the marginal of $\mathbb{P}$ on $\mathbb{X}$ . Thus, policies based only on $x$ are a.s. optimal.

\begin{proof} Goal: Construct a $\gamma^{*}$ given $\gamma$ Let $u=\gamma(x,y)$ .

To emphasize the random nature of the realizations $x,y,u$ we denote their corresponding rvs as $X,Y,U$ .

Given some Policy $\gamma$ , we write for any Borel subset $D\subset \mathbb{U}$ and $x \in \mathbb{X}$ , the following Stochastic Kernel $\int\limits _{\mathbb{U}}\mathbb{1}_{\{ u\in D \}} \, \mathbb{P}(U\in du\mid x) =\mathbb{P}^{\gamma}(U\in D\mid x)=\mathbb{P}(\gamma(X,Y)\in D\mid X=x)=\int\limits _{\mathbb{Y}}\mathbb{1}_{\{ \gamma(x,y)\in D \}} \, \mathbb{P}(Y\in dy\mid X=x).$ We then have $\begin{align*} \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dx,dy)&= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dy\mid x) \, \mathbb{P}(dx) \\ &= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{Y}} c(x,\gamma(x,y)) \, \mathbb{P}(Y\in dy\mid x)\, \mathbb{P}(dx)\\\\ &= \int\limits _{\mathbb{X}}\int\limits _{\mathbb{U}}c(x,u) \,\mathbb{P}^{\gamma}(U\in du\mid x) \, \mathbb{P}(dx)\\ &= \int\limits _{\mathbb{X}}\left( \int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x) \right) \, \mathbb{P}(dx) \end{align*}$ We then denote the inner integral as follows: $h^{\gamma}(x):=\int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x) \tag{1}$ to get $\int\limits _{\mathbb{X}}h^{\gamma}(x) \, \mathbb{P}(dx) .$ Now, consider the set $D=\{ (x,u)\in \mathbb{X}\times \mathbb{U}:c(x,u)\le h^{\gamma}(x) \}$ and its $\mathbb{X}$ -sections $D_{x}:=\{ u\in\mathbb{U}:(x,u)\in D \}\quad\forall x \in \mathbb{X},$ then $\mathbb{P}^{\gamma}(D_{x}\mid x)>0,\quad\forall x \in \mathbb{X}\tag{2}$ since otherwise we would have $\begin{align*} \int\limits _{\mathbb{U}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x)&= \int\limits _{D_{x}^{c}}c(x,u) \, \mathbb{P}^{\gamma}(du\mid x)\\ &> h^{\gamma}(x)\int\limits _{D_{x}^{c}} \, \mathbb{P}^{\gamma}(du\mid x)\\ &= h^{\gamma}(x)\int\limits _{\mathbb{U}} \, \mathbb{P}^{\gamma}(du\mid x)\\ &= h^{\gamma}(x) \end{align*}$ which contradicts our definition in $(1)$ . By some measurable selection theorem, $(2)$ implies that $\exists\gamma^{*}:\mathbb{X}\to \mathbb{U}$ such that $\{ (x,\gamma^{*}(x)): x \in \mathbb{X} \}\subseteq D$ i.e. $c(x,\gamma^{*}(x))\le h^{\gamma}(x)$ for every $x \in \mathbb{X}$ : $\int\limits _{\mathbb{X}}c(x,\gamma^{*}(x)) \, \mathbb{P}(dx)\le \int\limits _{\mathbb{X}}h^{\gamma}(x) \, \mathbb{P}(dx)\equiv \int\limits _{\mathbb{X}\times \mathbb{Y}}c(x,\gamma(x,y)) \, \mathbb{P}(dx,dy)$

\end{proof} >[!theorem] Ryll-Nardzewski Measurable Selection Theorem >Let $\mathbb{X},\mathbb{Y}$ be standard Borel. Let $\mathscr{A}$ be a countably-generated sub-σ-algebra of the Borel σ-algebra of $\mathbb{X}$ and let $\mathscr{B}$ be the class of Borel subsets of $\mathbb{Y}$ . For any function $\mathbb{X}\times \mathscr{B}$ s.t. >1. $\mathbb{P}(x\mid\cdot)$ is for each $x$ a Probability Measure on $\mathscr{B}$ and >2. for each $B\in\mathscr{B}$ , $\mathbb{P}(\cdot\mid B)$ is an $\mathscr{A}$ -Measurable Function on $\mathbb{X}$ , and any set $D\in\mathscr{A}\times \mathscr{B}$ s.t. $\mathbb{P}(x,D_{x})>0,\quad \forall x \in \mathbb{X}$ where $D_{x}=\{ y:(x,y)\in D \}$ . > >Then, there is a $\mathscr{A}$ -measurable function $g:\mathbb{X}\to \mathbb{Y}$ whose ‘graph’ is a subset of $D$ (i.e. $(x,g(x))\in D,\forall x \in \mathbb{X}$ ).

This is used to prove the following theorem.

Theorem (5.12)

Let $\{ (x_{t},u_{t}) \}$ be a controlled Markov chain. Consider the Finite Horizon Optimization problem: $J_{N}(X,\gamma)=E_{x}^{\gamma}\left[ \sum_{k=0}^{N-1}c(X_{k},U_{k})+c_{N}(X_{N}) \right]$ where we seek to minimize the cost over all admissible policies. Any such policy can be replaced with one which is Markov and which is at least as good as the original policy. i.e. there is no loss in restricting policies to be Markov.