Consider a POMDP(X,U,Y,K,T,Q,c) with control policy γ. When X,Y,U are countable we can reduce the partially observed process to a belief-MDP where the state at time t is πt(⋅):=P(Xt∈⋅∣y[0,t],u[0,t−1])∈P(X)we call πt the filter-process.
Construction of the Filter Process
Let Q(B∣x):=P(yt∈⋅∣xt=x) and assuming Q≪λ we define the Radon-Nikodym Derivativeg: g(x,y)=dλdG(yn∈⋅∣xn=x)(y)We can then describe πn+1 in terms of πn,yn+1,un using the kernel T and g: $$πn+1(dxn+1):=F(πn,yn+1,un)(dxn+1)=X∫X∫g(xn+1,yn+1)T(dxn+1∣xn,un)πn(dxn)X∫g(xn+1,yn+1)T(dxn+1∣xn,un)πn(dxn)where we integrate over $dx_{n}$ in both and denominator integrates over $dx_{n+1}$ as well.
## Transition Kernel of the Filter Process
We define the transition probability $\eta$ of the **filter process** as follows: ():={}{{ F(,u,y)}} , P(dy,u) whereF(,u,y):=F(y,u,)
## Cost Function of the Filter Process
Assuming we want to minimize the [Finite Horizon Optimization](/garden/math/control-theory/stochastic-control/optimal-stochastic-control/concepts/optimization-problems/finite-horizon-optimization) problem E_{x_{0}}^{}we define the cost function $\tilde{c}:\mathcal{P}(\mathbb{X})\times \mathbb{U}\to[0,\infty)$ as (,u)=_{}c(x,u) , (dx) $$ >[!thm] Filter process is a CMC >Let (X,U,Y,K,T,Q,c) be a POMDP with filter processπt. Then, {πt,ut} is a Controlled Markov Chain.
Hence, the filter processπt defines a completely observable MDP defined as (P(X),U,K~,η,c~)
Weak Feller Continuity of Belief MDP
Let X be a Borel space. Suppose that we have a family of uniformly bounded, real, Borel functions {fn,λ}n≥1,λ∈Λ, for some set Λ. If, for any xn→x in X we have n→∞limλ∈Λsup∣fn,λ(xn)−fλ(x)∣=0n→∞limλ∈Λsup∣fλ(xn)−fλ(x)∣=0then, for any μn→μweakly in P(X), we have n→∞limλ∈ΛsupX∫fn,λ(x)μn(dx)−X∫fλ(x)μ(dx)=0
Under the transition probability η(⋅∣π,u) of the filter process is weakly continuous in (π,u).
\begin{proof} This proof consists of showing that for every (π0n,un)→(π0,u) in P(X)×U we have ∥f∥BL≤1supP(X)∫f(π1)η(dπ1∣π0n,un)−P(X)∫f(π1)η(dπ1∣π0,u)→0,where we equip P(X) with the metricρ to define the Bounded-Lipschitz∥f∥BL of any Borel functionf:P(X)→R. We can equivalently write this as ∥f∥BL≤1supY∫f(π1(π0n,un,y1))P(dy1∣π0n,un)−Y∫f(π1(π0,u,y1))P(dy1∣π0,u)→0we can then upper bound the above term: ∥f∥BL≤1supY∫f(π1(π0n,un,y1))P(dy1∣π0n,un)−Y∫f(π1(π0,u,y1))P(dy1∣π0,u)≤∥f∥BL≤1supY∫f(π1(π0n,un,y1))P(dy1∣π0n,un)−Y∫f(π1(π0n,u,y1))P(dy1∣π0,u)+∥f∥BL≤1supY∫∣f(π1(π0n,un,y1))−f(π1(π0,u,y1))∣P(dy1∣π0,u)≤∥P(⋅∣π0n,un)−P(⋅∣π0,u)∥TV+∥f∥BL≤1supY∫∣f(π1(π0n,un,y1))−f(π1(π0,u,y1))∣P(dy1∣π0,u)(8)where in (8) we use the fact that ∥f∥∞≤∥f∥BL≤1.
To prove (8)→0 it is sufficient to prove: 1. P(dy1∣π0,u0) is Total Variation 2. (π0n,un)→(π0,u)⟹Y∫ρ(π1(π0n,uny1),π1(π0,u,y1))P(dy1∣π0,u)→0(9)since (second term of 8)≤(9). >[!claim] >P(dy1∣π0,u0) is Total Variation
Let (π0n,un)→(π0,u). Then A∈B(Y)sup∣P(A∣π0n,un)−P(A∣π0,u)∣=A∈B(Y)supX∫Q(A∣x1,un)T(dx1∣π0n,un)−X∫Q(A∣x1,u)T(dx1∣z0,u),where T(dx1∣π0n,un):=X∫T(dx1∣x0,un)π0n(dx0).Note that, by , we can show that T(dx1∣π0n,un)→T(dx1∣π0,u)weakly.
Indeed, let g∈Cb(X), then define rn(x0)=X∫g(x1)T(dx1∣x0,un) and r(x0)=X∫g(x1)T(dx1∣x0,u). Since T(dx1∣x0,u) is Feller Property, we have rn(x0)→r(x0) when x0n→x0. Hence, by we have that n→∞limX∫rn(x0)π0n(dx0)−X∫r(x0)π0(dx0)Hence T(dx1∣π0n,un)→T(dx1∣π0,u) weakly. Moreover the families of functions {Q(A∣⋅,un)}n≥1,A∈B(Y) and {Q(A∣⋅,u)}A∈B(Y) satisfy the conditions of as Q is Total Variation. Therefore yields n→∞limA∈B(Y)supX∫Q(A∣x1,un)T(dx1∣π0n,un)−X∫Q(A∣x1,u)T(dx1∣π0,u)=0Thus, P(dy1∣π0,u) is Total Variation.
\end{proof}
Comparison of Conditions on Observation Channels
Suppose that the observation channel Q(dy∣x,u) is continuous in total variation. Then, for any (π,u)∈P(X)×U, we have, T(⋅∣π,u)-a.s., that Q(dy∣x,u)≪P(dy∣π,u) and Q(dy∣x,u)=g(x,u,y)P(dy∣π,u)for a measurable functiong, which satisfies for any A∈B(Y) and for any xk→xA∫∣g(xk,u,y)−g(x,u,y)∣P(dy∣π,u)→0.
\begin{proof} We begin by fixing any (π,u)∈P(X)×U. >[!clm] >Q(dy∣x,u)≪P(dy∣π,u),T(⋅∣π,u)-a.s.
As Q(dy∣x,u) is continuous in total variation, the image of Kn×{u} under Q(dy∣x,u) is compact in P(Y) (this is by Heine-Borel Theorem). Hence, there exists {ν1,…,νl}⊂P(Y) such that x∈Knmaxi=1,…,lmin∥Q(⋅∣x,u)−νi∥TV<3n1 Define the following Stochastic Kernel: νn(⋅∣x,u)=νiargmin∥Q(⋅∣x,u)−νi∥TVThen, we define Pn(⋅∣π,u)=X∫νn(⋅∣x,u)T(dx∣π,u). One can prove that ∥P(⋅∣π,u)−Pn(⋅∣π,u)∥TV<n1. Moreover, since Pn(⋅∣π,u) is a mixture of finite probability measures {ν1,…,νl}, we have that νn(⋅∣x,u)≪Pn(⋅∣π,u), ∀x∈Cn, where T(Cn∣π,u)=1. Let C=⋂nCn, and so, T(C∣π,u)=1. >[!clm] >x∈C⟹Q(dy∣x,u)≪P(dy∣π,u)
To prove this claim we fix any ϵ>0 and choose n≥1 s.t. ϵ>3n2. Then, ∃δ>0 s.t. νn(A∣x,u)<2ϵ whenever Pn(A∣π,u)<δ. This gives us that Q(A∣x,u)<ϵ whenever P(A∣π,u)<δ+n1. Hence Q(dy∣x,u)≪P(dy∣π,u). \end{proof}