Let X,Y,U be Polish spaces and let P be a probability measure on B(X×Y), and let c:X×U→R be a bounded, Borel cost function. Then, for any Borel function γ:X×Y→U, there exists another Borel function γ∗:X→U s.t. X∫c(x,γ∗(x))dPX(dx)≤X×Y∫c(x,γ(x,y))dP(dx,dy)where PX is the marginal of P on X. Thus, policies based only on x are a.s. optimal.
\begin{proof}Goal: Construct a γ∗ given γ Let u=γ(x,y).
To emphasize the random nature of the realizationsx,y,u we denote their corresponding rvs as X,Y,U.
Given some policyγ, we write for any Borel subsetD⊂U and x∈X, the following Stochastic KernelU∫1{u∈D}P(U∈du∣x)=Pγ(U∈D∣x)=P(γ(X,Y)∈D∣X=x)=Y∫1{γ(x,y)∈D}P(Y∈dy∣X=x).We then have X×Y∫c(x,γ(x,y))P(dx,dy)=X∫Y∫c(x,γ(x,y))P(dy∣x)P(dx)=X∫Y∫c(x,γ(x,y))P(Y∈dy∣x)P(dx)=X∫U∫c(x,u)Pγ(U∈du∣x)P(dx)=X∫U∫c(x,u)Pγ(du∣x)P(dx)We then denote the inner integral as follows: hγ(x):=U∫c(x,u)Pγ(du∣x)(1)to get X∫hγ(x)P(dx).Now, consider the set D={(x,u)∈X×U:c(x,u)≤hγ(x)}and its X-sections Dx:={u∈U:(x,u)∈D}∀x∈X,then Pγ(Dx∣x)>0,∀x∈X(2)since otherwise we would have U∫c(x,u)Pγ(du∣x)=Dxc∫c(x,u)Pγ(du∣x)>hγ(x)Dxc∫Pγ(du∣x)=hγ(x)U∫Pγ(du∣x)=hγ(x)which contradicts our definition in (1). By some measurable selection theorem, (2) implies that ∃γ∗:X→U such that {(x,γ∗(x)):x∈X}⊆Di.e. c(x,γ∗(x))≤hγ(x) for every x∈X: X∫c(x,γ∗(x))P(dx)≤X∫hγ(x)P(dx)≡X×Y∫c(x,γ(x,y))P(dx,dy)
\end{proof} >[!theorem] Ryll-Nardzewski Measurable Selection Theorem >Let X,Y be standard Borel. Let A be a countably-generated sub-σ-algebra of the Borel σ-algebra of X and let B be the class of Borel subsets of Y. For any function X×B s.t. >1. P(x∣⋅) is for each x a probability measure on B and >2. for each B∈B, P(⋅∣B) is an A-measurable function on X, and any set D∈A×B s.t. P(x,Dx)>0,∀x∈Xwhere Dx={y:(x,y)∈D}. > >Then, there is a A-measurable function g:X→Y whose ‘graph’ is a subset of D (i.e. (x,g(x))∈D,∀x∈X).