FIND ME ON

GitHub

LinkedIn

Belief MDP

🌱

StochasticControl

Construction of Belief MDP

Consider a POMDP (X,U,Y,K,T,Q,c)(\mathbb{X}, \mathbb{U}, \mathbb{Y}, \mathbb{K},\mathcal{T}, Q,c) with control policy γ\gamma. When X,Y,U\mathbb{X},\mathbb{Y},\mathbb{U} are countable we can reduce the partially observed process to a belief-MDP where the state at time tt is πt():=P(Xty[0,t],u[0,t1])P(X)\pi_{t}(\cdot):=P(X_{t}\in\cdot \mid y_{[0,t]},u_{[0,t-1]})\in\mathcal{P}(\mathbb{X})we call πt\pi_{t} the filter-process.

Construction of the Filter Process

Let Q(Bx):=P(ytxt=x)Q(B\mid x):=P(y_{t}\in\cdot \mid x_{t}=x) and assuming QλQ\ll\lambda we define the Radon-Nikodym Derivative gg: g(x,y)=dG(ynxn=x)dλ(y)g(x,y)=\frac{dG(y_{n}\in\cdot \mid x_{n}=x)}{d\lambda}(y)We can then describe πn+1\pi_{n+1} in terms of πn,yn+1,un\pi_{n},y_{n+1},u_{n} using the kernel T\mathcal{T} and gg: $$πn+1(dxn+1):=F(πn,yn+1,un)(dxn+1)=Xg(xn+1,yn+1)T(dxn+1xn,un)πn(dxn)XXg(xn+1,yn+1)T(dxn+1xn,un)πn(dxn)\begin{align*} \pi_{n+1}(dx_{n+1})&:=F(\pi_{n},y_{n+1},u_{n})(dx_{n+1})\\\\ &=\frac{\int\limits _{\mathbb{X}}g(x_{n+1},y_{n+1})\mathcal{T}(dx_{n+1}\mid x_{n},u_{n})\,\pi_{n}(dx_{n})}{\int\limits _{\mathbb{X}}\int\limits _{\mathbb{X}}g(x_{n+1},y_{n+1})\mathcal{T}(dx_{n+1}\mid x_{n},u_{n}) \, \pi_{n}(dx_{n}) } \end{align*}where we integrate over $dx_{n}$ in both and denominator integrates over $dx_{n+1}$ as well. ## Transition Kernel of the Filter Process We define the transition probability $\eta$ of the **filter process** as follows: ():={}{{ F(,u,y)}} , P(dy,u) wherewhere F(,u,y):=F(y,u,) ## Cost Function of the Filter Process Assuming we want to minimize the [Finite Horizon Optimization](/garden/math/control-theory/stochastic-control/optimal-stochastic-control/concepts/optimization-problems/finite-horizon-optimization) problem E_{x_{0}}^{}we define the cost function $\tilde{c}:\mathcal{P}(\mathbb{X})\times \mathbb{U}\to[0,\infty)$ as (,u)=_{}c(x,u) , (dx) $$ >[!thm] Filter process is a CMC >Let (X,U,Y,K,T,Q,c)(\mathbb{X}, \mathbb{U}, \mathbb{Y}, \mathbb{K},\mathcal{T}, Q,c) be a POMDP with filter process πt\pi_{t}. Then, {πt,ut}\{ \pi_{t},u_{t} \} is a Controlled Markov Chain.

Hence, the filter process πt\pi_{t} defines a completely observable MDP defined as (P(X),U,K~,η,c~)(\mathcal{P}(\mathbb{X}),\mathbb{U}, \tilde{\mathbb{K}},\eta, \tilde{c})

Weak Feller Continuity of Belief MDP

Let X\mathbb{X} be a Borel space. Suppose that we have a family of uniformly bounded, real, Borel functions {fn,λ}n1,λΛ\{ f_{n,\lambda} \}_{n\ge 1,\lambda \in\Lambda}, for some set Λ\Lambda. If, for any xnxx_{n}\to x in X\mathbb{X} we have limnsupλΛfn,λ(xn)fλ(x)=0limnsupλΛfλ(xn)fλ(x)=0\begin{align*} \lim_{ n \to \infty } \sup_{\lambda \in\Lambda}\left| f_{n,\lambda}(x_{n})-f_{\lambda}(x) \right| =0\\ \lim_{ n \to \infty } \sup_{\lambda \in\Lambda}\left| f_{\lambda}(x_{n})-f_{\lambda}(x) \right| =0 \end{align*}then, for any μnμ\mu_{n}\to\mu weakly in P(X)\mathcal{P}(\mathbb{X}), we have limnsupλΛXfn,λ(x)μn(dx)Xfλ(x)μ(dx)=0\lim_{ n \to \infty } \sup_{\lambda \in\Lambda}\left| \int\limits _{\mathbb{X}}f_{n,\lambda}(x) \, \mu_{n}(dx)-\int\limits _{\mathbb{X}}f_{\lambda}(x) \, \mu(dx) \right| =0

\begin{proof}

\end{proof} >[!assumption|1] >1. The transition probability T(x,u)\mathcal{T}(\cdot\mid x,u) is weakly continuous in (x,u)(x,u). >2. The observation channel Q(x,u)Q(\cdot\mid x,u) is continuous in total variation.

Under the transition probability η(π,u)\eta(\cdot\mid \pi,u) of the filter process is weakly continuous in (π,u)(\pi,u).

\begin{proof} This proof consists of showing that for every (π0n,un)(π0,u)(\pi_{0}^{n},u_{n})\to(\pi_{0},u) in P(X)×U\mathcal{P}(\mathbb{X})\times \mathbb{U} we have supfBL1P(X)f(π1)η(dπ1π0n,un)P(X)f(π1)η(dπ1π0,u)0,\sup_{\lVert f \rVert _{BL}\le 1}\left| \int\limits _{\mathcal{P}(\mathbb{X})}f(\pi_{1}) \, \eta(d\pi_{1}\mid \pi_{0}^{n},u_{n})-\int\limits _{\mathcal{P}(\mathbb{X})}f(\pi_{1}) \, \eta(d\pi_{1}\mid \pi_{0},u) \right| \to{0},where we equip P(X)\mathcal{P}(\mathbb{X}) with the metric ρ\rho to define the Bounded-Lipschitz fBL\lVert f \rVert_{BL} of any Borel function f:P(X)Rf:\mathcal{P}(\mathbb{X})\to \mathbb{R}. We can equivalently write this as supfBL1Yf(π1(π0n,un,y1))P(dy1π0n,un)Yf(π1(π0,u,y1))P(dy1π0,u)0\sup_{\lVert f \rVert _{BL}\le 1}\left| \int\limits _{\mathbb{Y}}f(\pi_{1}(\pi_{0}^{n},u_{n},y_{1})) \, P(dy_{1}\mid \pi_{0}^{n},u_{n}) -\int\limits _{\mathbb{Y}}f(\pi_{1}(\pi_{0},u,y_{1})) \, P(dy_{1}\mid \pi_{0},u) \right| \to0we can then upper bound the above term: supfBL1Yf(π1(π0n,un,y1))P(dy1π0n,un)Yf(π1(π0,u,y1))P(dy1π0,u)supfBL1Yf(π1(π0n,un,y1))P(dy1π0n,un)Yf(π1(π0n,u,y1))P(dy1π0,u)+supfBL1Yf(π1(π0n,un,y1))f(π1(π0,u,y1))P(dy1π0,u)P(π0n,un)P(π0,u)TV+supfBL1Yf(π1(π0n,un,y1))f(π1(π0,u,y1))P(dy1π0,u)\begin{align*} &\sup_{\lVert f \rVert _{BL}\le 1}\left| \int\limits _{\mathbb{Y}}f(\pi_{1}(\pi_{0}^{n},u_{n},y_{1})) \, P(dy_{1}\mid \pi_{0}^{n},u_{n}) -\int\limits _{\mathbb{Y}}f(\pi_{1}(\pi_{0},u,y_{1})) \, P(dy_{1}\mid \pi_{0},u) \right|\\ &\le \sup_{\lVert f \rVert _{BL}\le 1}\left| \int\limits _{\mathbb{Y}}f(\pi_{1}(\pi_{0}^{n},u_{n},y_{1})) \, P(dy_{1}\mid \pi_{0}^{n},u_{n}) -\int\limits _{\mathbb{Y}}f(\pi_{1}(\pi_{0}^{n},u,y_{1})) \, P(dy_{1}\mid \pi_{0},u) \right|\\ &\quad+ \sup_{\lVert f \rVert _{BL}\le 1} \int\limits _{\mathbb{Y}}\left|f(\pi_{1}(\pi_{0}^{n},u_{n},y_{1})) - f(\pi_{1}(\pi_{0},u,y_{1})) \right| \,P(dy_{1}\mid \pi_{0},u) \\ &\le \lVert P(\cdot\mid \pi_{0}^{n},u_{n})-P(\cdot\mid \pi_{0},u) \rVert _{TV}+\sup_{\lVert f \rVert _{BL}\le 1} \int\limits _{\mathbb{Y}}\left|f(\pi_{1}(\pi_{0}^{n},u_{n},y_{1})) - f(\pi_{1}(\pi_{0},u,y_{1})) \right| \,P(dy_{1}\mid \pi_{0},u)\tag{8}\\ \end{align*}where in (8)(8) we use the fact that ffBL1\lVert f \rVert_{\infty}\le\lVert f \rVert_{BL}\le 1.
To prove (8)0(8)\to0 it is sufficient to prove: 1. P(dy1π0,u0)P(dy_{1}\mid \pi_{0},u_{0}) is Total Variation 2. (π0n,un)(π0,u)    Yρ(π1(π0n,uny1),π1(π0,u,y1))P(dy1π0,u)0(9)(\pi_{0}^{n},u_{n})\to(\pi_{0},u)\implies\int\limits _{\mathbb{Y}}\rho(\pi_{1}(\pi_{0}^{n},u_{n}y_{1}),\pi_{1}(\pi_{0},u,y_{1})) \, P(dy_{1}\mid \pi_{0},u) \to 0\tag{9}since (second term of 8)(9)(\text{second term of }8)\le(9). >[!claim] >P(dy1π0,u0)P(dy_{1}\mid \pi_{0},u_{0}) is Total Variation

Let (π0n,un)(π0,u)(\pi_{0}^{n},u_{n})\to(\pi_{0},u). Then supAB(Y)P(Aπ0n,un)P(Aπ0,u)=supAB(Y)XQ(Ax1,un)T(dx1π0n,un)XQ(Ax1,u)T(dx1z0,u),\begin{align*} &\sup_{A\in \mathcal{B}(\mathbb{Y})}\left| P(A\mid \pi_{0}^{n},u_{n})-P(A\mid \pi_{0},u) \right| \\ &=\sup_{A\in \mathcal{B}(\mathbb{Y})}\left| \int\limits _{\mathbb{X}}Q(A\mid x_{1},u_{n}) \, \mathcal{T}(dx_{1}\mid \pi_{0}^{n},u_{n})-\int\limits _{\mathbb{X}}Q(A\mid x_{1},u) \, \mathcal{T}(dx_{1}\mid z_{0},u) \right| , \end{align*}where T(dx1π0n,un):=XT(dx1x0,un)π0n(dx0).\mathcal{T}(dx_{1}\mid \pi_{0}^{n},u_{n}):=\int\limits _{\mathbb{X}}\mathcal{T}(dx_{1}\mid x_{0},u_{n}) \, \pi_{0}^{n}(dx_{0}) .Note that, by , we can show that T(dx1π0n,un)T(dx1π0,u)\mathcal{T}(dx_{1}\mid \pi_{0}^{n},u_{n})\to \mathcal{T}(dx_{1}\mid \pi_{0},u) weakly.

Indeed, let gCb(X)g\in C_{b}(\mathbb{X}), then define rn(x0)=Xg(x1)T(dx1x0,un)r_{n}(x_{0})=\int\limits _{\mathbb{X}}g(x_{1}) \, \mathcal{T}(dx_{1}\mid x_{0},u_{n}) and r(x0)=Xg(x1)T(dx1x0,u)r(x_{0})=\int\limits _{\mathbb{X}}g(x_{1}) \, \mathcal{T}(dx_{1}\mid x_{0},u). Since T(dx1x0,u)\mathcal{T}(dx_{1}\mid x_{0},u) is Feller Property, we have rn(x0)r(x0)r_{n}(x_{0})\to r(x_{0}) when x0nx0x_{0}^{n}\to x_{0}. Hence, by we have that limnXrn(x0)π0n(dx0)Xr(x0)π0(dx0)\lim_{ n \to \infty } \left| \int\limits _{\mathbb{X}}r_{n}(x_{0}) \, \pi _{0}^{n}(dx_{0})-\int\limits _{\mathbb{X}}r(x_{0}) \, \pi_{0}(dx_{0}) \right| Hence T(dx1π0n,un)T(dx1π0,u)\mathcal{T}(dx_{1}\mid \pi_{0}^{n},u_{n})\to \mathcal{T}(dx_{1}\mid \pi_{0},u) weakly. Moreover the families of functions {Q(A,un)}n1,AB(Y)\{ Q(A\mid \cdot,u_{n}) \}_{n\ge 1,A\in \mathcal{B}(\mathbb{Y})} and {Q(A,u)}AB(Y)\{ Q(A\mid \cdot,u) \}_{A\in \mathcal{B}(\mathbb{Y})} satisfy the conditions of as QQ is Total Variation. Therefore yields limnsupAB(Y)XQ(Ax1,un)T(dx1π0n,un)XQ(Ax1,u)T(dx1π0,u)=0\lim_{ n \to \infty } \sup_{A\in \mathcal{B}(\mathbb{Y})}\left| \int\limits _{\mathbb{X}}Q(A\mid x_{1},u_{n}) \, \mathcal{T}(dx_{1}\mid \pi_{0}^{n},u_{n}) -\int\limits _{\mathbb{X}}Q(A\mid x_{1},u) \, \mathcal{T}(dx_{1}\mid \pi_{0},u) \right| =0Thus, P(dy1π0,u)P(dy_{1}\mid \pi_{0},u) is Total Variation.

\end{proof}

Comparison of Conditions on Observation Channels

Suppose that the observation channel Q(dyx,u)Q(dy\mid x,u) is continuous in total variation. Then, for any (π,u)P(X)×U(\pi,u)\in\mathcal{P}(\mathbb{X})\times \mathbb{U}, we have, T(π,u)\mathcal{T}(\cdot\mid \pi,u)-a.s., that Q(dyx,u)P(dyπ,u)Q(dy\mid x,u)\ll P(dy\mid \pi,u) and Q(dyx,u)=g(x,u,y)P(dyπ,u)Q(dy\mid x,u)=g(x,u,y)P(dy\mid \pi,u)for a measurable function gg, which satisfies for any AB(Y)A\in\mathcal{B}(\mathbb{Y}) and for any xkxx_{k}\to x Ag(xk,u,y)g(x,u,y)P(dyπ,u)0.\int\limits _{A}|g(x_{k},u,y)-g(x,u,y)| \, P(dy\mid \pi,u) \to0.

\begin{proof} We begin by fixing any (π,u)P(X)×U(\pi,u)\in\mathcal{P}(\mathbb{X})\times \mathbb{U}. >[!clm] >Q(dyx,u)P(dyπ,u),T(π,u)-a.s.Q(dy\mid x,u)\ll P(dy\mid \pi,u),\,\mathcal{T}(\cdot\mid \pi,u)\text{-a.s.}

Note that (by definition of absolute continuity) Q(dyx,u)P(dyπ,u)Q(dy\mid x,u)\ll P(dy\mid \pi,u) if and only if ϵ>0,δ>0:P(Aπ,u)<δ    Q(Ax,u)<ϵ\forall\epsilon>0,\exists\delta>0:P(A\mid \pi,u)<\delta\implies Q(A\mid x,u)<\epsilonFor each n1n\ge 1 let KnXK_{n}\subset\mathbb{X} be Compact such that T(Knπ,u)>113n\mathcal{T}(K_{n}\mid \pi,u)>1-\frac{1}{3n}which can be done due to Prokhorov’s Theorem (i.e. since T\mathcal{T} is Feller Property then it is necessarily Prokhorov’s Theorem hence it is also tight.

As Q(dyx,u)Q(dy\mid x,u) is continuous in total variation, the image of Kn×{u}K_{n}\times \{ u \} under Q(dyx,u)Q(dy\mid x,u) is compact in P(Y)\mathcal{P}(\mathbb{Y}) (this is by Heine-Borel Theorem). Hence, there exists {ν1,,νl}P(Y)\{ \nu_{1},\dots,\nu_{l} \}\subset \mathcal{P}(\mathbb{Y}) such that maxxKnmini=1,,lQ(x,u)νiTV<13n\max_{x \in K_{n}}\min_{i=1,\dots,l}\lVert Q(\cdot\mid x,u)-\nu_{i} \rVert _{TV}< \frac{1}{3n} Define the following Stochastic Kernel: νn(x,u)=argminνiQ(x,u)νiTV\nu_{n}(\cdot\mid x,u)=\underset{ \nu_{i} }{ \arg\min }\lVert Q(\cdot\mid x,u)-\nu_{i} \rVert _{TV}Then, we define Pn(π,u)=Xνn(x,u)T(dxπ,u).P_{n}(\cdot\mid \pi,u)=\int\limits _{\mathbb{X}}\nu_{n}(\cdot\mid x,u) \, \mathcal{T}(dx\mid \pi,u). One can prove that P(π,u)Pn(π,u)TV<1n.\lVert P(\cdot\mid \pi,u)-P_{n}(\cdot\mid \pi,u) \rVert_{TV}< \frac{1}{n}. Moreover, since Pn(π,u)P_{n}(\cdot\mid \pi,u) is a mixture of finite probability measures {ν1,,νl}\{ \nu_{1},\dots,\nu_{l} \}, we have that νn(x,u)Pn(π,u)\nu_{n}(\cdot\mid x,u)\ll P_{n}(\cdot\mid \pi,u), xCn\forall x \in C_{n}, where T(Cnπ,u)=1\mathcal{T}(C_{n}\mid \pi,u)=1. Let C=nCnC=\bigcap_{n}C_{n}, and so, T(Cπ,u)=1\mathcal{T}(C\mid \pi,u)=1. >[!clm] >xC    Q(dyx,u)P(dyπ,u)x \in C\implies Q(dy\mid x,u)\ll P(dy\mid \pi,u)

To prove this claim we fix any ϵ>0\epsilon>0 and choose n1n\ge 1 s.t. ϵ>23n\epsilon> \frac{2}{3n}. Then, δ>0\exists\delta>0 s.t. νn(Ax,u)<ϵ2\nu_{n}(A\mid x,u)< \frac{\epsilon}{2} whenever Pn(Aπ,u)<δP_{n}(A\mid \pi,u)<\delta. This gives us that Q(Ax,u)<ϵQ(A\mid x,u)<\epsilon whenever P(Aπ,u)<δ+1nP(A\mid \pi,u)<\delta+ \frac{1}{n}. Hence Q(dyx,u)P(dyπ,u)Q(dy\mid x,u)\ll P(dy\mid \pi,u). \end{proof}

Linked from