Controlled Markov Chain

Definition (Controlled Markov Chain)

Let $\{ (x_{k},u_{k}) \}$ be a collection that satisfies this model: $x_{k+1}=f(x_{k},u_{k},w_{k})$ where $x_{t}\in\mathbb{X}$ represents the state variable, $u_{t}\in\mathbb{U}$ represents the action variable, $w_{t}\in\mathbb{W}$ is an i.i.d noise process, and $f$ a Measurable Function. We assume that $\mathbb{X},\mathbb{U},\mathbb{W}$ are Borel subsets of Polish spaces; these subsets are also called standard Borel. If $\{ (x_{k},u_{k}) \}$ also satisfies $P(x_{k+1}\in B|x_{[0,k]}=a_{[0,t]},u_{[0,k]}=b_{[0,t]})= P(x_{k+1}\in B|x_{k}=a_{k},u_{k}=b_{k})\quad\forall B\in\mathcal{B}(\mathbb{R}),k\in\mathbb{Z}_{+}$ Then we call $\{ (x_{k},u_{k}) \}$ a controlled Markov chain.

More information

Consider this model again: $x_{k+1}=f(x_{k},u_{k},w_{k})$

^statespace

where $x_{t}\in\mathbb{X}$ , $u_{t}\in\mathbb{U}$ , $w_{t}\in\mathbb{W}$ , $f$ a Measurable Function, $\mathbb{X},\mathbb{U},\mathbb{W}$ are standard Borel. We assume all Random Variables live in some Probability Space $(\Omega,\mathcal{F},P)$ . The collection, $\{ (x_{k},u_{k}) \}$ , satisfying also satisfies

$\begin{align*} P(x_{k+1}\in B|x_{[0,k]}=a_{[0,t]},u_{[0,k]}=b_{[0,t]})&= P(x_{k+1}\in B|x_{k}=a_{k},u_{k}=b_{k})\\ &=:\mathcal{T}(B\mid a_{t},b_{t}) \end{align*}$ ^property

$\forall B\in\mathcal{B}(\mathbb{R}),k\in\mathbb{Z}_{+}$ , where $\mathcal{T}(\cdot\mid x,u)$ is a Stochastic Kernel s.t. $\mathcal{T}:\mathbb{X}\times \mathbb{U}\to \mathbb{X}$ so that: >- For every $B\in\mathcal{B}(\mathbb{R})$ , $\mathcal{T}(B\mid \cdot,\cdot)$ is a measurable function on $\mathbb{X}\times \mathbb{U}$ and; >- For every fixed $(a,b)\in\mathbb{X}\times \mathbb{U}$ , $\mathcal{T}(\cdot\mid x,u)$ is a Probability Measure on $(\mathbb{X},\mathcal{B}(\mathbb{X}))$ .

That is, all Stochastic Processes that satisfy , admit a Stochastic Realization in the form of almost surely.

Remark

For the process $\{ x_{t}, u_{t} \}$ to define a Stochastic Process, in addition to a transition kernel and an initial Measure on $x_{0}$ (i.e. a Prior), we need to specify the dependence of $u_{t}$ on the history of the process. Once this is established through Ionescu Tulcea Theorem, one can construct a Stochastic Process $\{ x_{t},u_{t} \}_{t\ge 0}$ . This dependence is called a control Policy.

Linked from

Blackwell's Irrelevant Information Theorem

Markov Decision Process

Policy

Belief MDP