Controlled Markov Chain

Definition (Controlled Markov Chain)

Let {(xk,uk)}\{ (x_{k},u_{k}) \} be a collection that satisfies this model: xk+1=f(xk,uk,wk) x_{k+1}=f(x_{k},u_{k},w_{k}) where xtXx_{t}\in\mathbb{X} represents the state variable, utUu_{t}\in\mathbb{U} represents the action variable, wtWw_{t}\in\mathbb{W} is an i.i.d noise process, and ff a Measurable Function. We assume that X,U,W\mathbb{X},\mathbb{U},\mathbb{W} are Borel subsets of Polish spaces; these subsets are also called standard Borel. If {(xk,uk)}\{ (x_{k},u_{k}) \} also satisfies P(xk+1Bx[0,k]=a[0,t],u[0,k]=b[0,t])=P(xk+1Bxk=ak,uk=bk)BB(R),kZ+ P(x_{k+1}\in B|x_{[0,k]}=a_{[0,t]},u_{[0,k]}=b_{[0,t]})= P(x_{k+1}\in B|x_{k}=a_{k},u_{k}=b_{k})\quad\forall B\in\mathcal{B}(\mathbb{R}),k\in\mathbb{Z}_{+} Then we call {(xk,uk)}\{ (x_{k},u_{k}) \} a controlled Markov chain.

More information

Consider this model again: xk+1=f(xk,uk,wk) x_{k+1}=f(x_{k},u_{k},w_{k})

^statespace

where xtXx_{t}\in\mathbb{X}, utUu_{t}\in\mathbb{U}, wtWw_{t}\in\mathbb{W}, ff a Measurable Function, X,U,W\mathbb{X},\mathbb{U},\mathbb{W} are standard Borel. We assume all Random Variables live in some Probability Space (Ω,F,P)(\Omega,\mathcal{F},P). The collection, {(xk,uk)}\{ (x_{k},u_{k}) \}, satisfying also satisfies

P(xk+1Bx[0,k]=a[0,t],u[0,k]=b[0,t])=P(xk+1Bxk=ak,uk=bk)=:T(Bat,bt) \begin{align*} P(x_{k+1}\in B|x_{[0,k]}=a_{[0,t]},u_{[0,k]}=b_{[0,t]})&= P(x_{k+1}\in B|x_{k}=a_{k},u_{k}=b_{k})\\ &=:\mathcal{T}(B\mid a_{t},b_{t}) \end{align*} ^property

BB(R),kZ+\forall B\in\mathcal{B}(\mathbb{R}),k\in\mathbb{Z}_{+}, where T(x,u)\mathcal{T}(\cdot\mid x,u) is a Stochastic Kernel s.t. T:X×UX\mathcal{T}:\mathbb{X}\times \mathbb{U}\to \mathbb{X} so that: >- For every BB(R)B\in\mathcal{B}(\mathbb{R}), T(B,)\mathcal{T}(B\mid \cdot,\cdot) is a measurable function on X×U\mathbb{X}\times \mathbb{U} and; >- For every fixed (a,b)X×U(a,b)\in\mathbb{X}\times \mathbb{U}, T(x,u)\mathcal{T}(\cdot\mid x,u) is a Probability Measure on (X,B(X))(\mathbb{X},\mathcal{B}(\mathbb{X})).

That is, all Stochastic Processes that satisfy , admit a Stochastic Realization in the form of almost surely.

Remark

For the process {xt,ut}\{ x_{t}, u_{t} \} to define a Stochastic Process, in addition to a transition kernel and an initial Measure on x0x_{0} (i.e. a Prior), we need to specify the dependence of utu_{t} on the history of the process. Once this is established through Ionescu Tulcea Theorem, one can construct a Stochastic Process {xt,ut}t0\{ x_{t},u_{t} \}_{t\ge 0}. This dependence is called a control Policy.

Linked from