Policy

Definition (Policy)

A policy is

Definition (Deterministic Admissible Policy)

Let H0:=X\mathbb{H}_{0}:=\mathbb{X}, Ht=Ht1×K\mathbb{H}_{t}=\mathbb{H}_{t-1}\times \mathbb{K}, for t=1,2,t=1,2,\dots. We let ItI_{t} denote an element of Ht\mathbb{H}_{t} where It={x[0,t],u[0,t1]}.I_{t}=\{ x_{[0,t]},u_{[0,t-1]} \}.A deterministic admissible control policy γ\gamma is a sequence of functions {γt,tZ+}\{\gamma_{t},t\in\mathbb{Z}_{+}\} such that γ:HtU\gamma:\mathbb{H}_{t}\to \mathbb{U} with ut=γt(It)u_{t}=\gamma_{t}(I_{t})

We can also state this as follows: > [!remark|*] Alternate Definition >Let us write that utu_{t} is a realization of the action Random Variable UtU_{t} under an admissible policy, and we would like to also emphasize that HtH_{t} is a Random Variable with realization ItI_{t}. We say that γt\gamma_{t} is a Measurable Function on σ(Ht)\sigma(H_{t}) in the sense that for every Borel subset BUB\subset \mathbb{U} we have that {ω:Ut(ω)B}=Ut1(B)σ(Ht)\{ \omega :U_{t}(\omega)\in B \}=U_{t}^{-1}(B)\subset\sigma(H_{t})

Definition (Randomized Admissible Control Policy)

A randomized admissible control policy is a sequence γ={γt,t0}\gamma=\{ \gamma_{t},t\ge 0 \} such that γ:HtP(U)\gamma:\mathbb{H}_{t}\to \mathcal{P}(\mathbb{U}) with P(U)\mathcal{P}(\mathbb{U}) being the set of probability measures on U\mathbb{U}, so that for every realization ItI_{t}, we have that γt(It)\gamma_{t}(I_{t}) is a Probability Measure on U\mathbb{U}. By Stochastic Realization arguments this is equivalent to writing ut=γt(It,rt)u_{t}=\gamma_{t}(I_{t},r_{t})for some [0,1][0,1]-valued i.i.d. Random Variable rtr_{t}.

Definition (Markov policy)

A deterministic Markov control policy γΓM\gamma\in\Gamma_{M} is a sequence of functions {γt}\{ \gamma_{t} \} s.t. γt:X×Z+U\gamma_{t}:\mathbb{X}\times \mathbb{Z}_{+}\to \mathbb{U}and ut=γt(xt)u_{t}=\gamma_{t}(x_{t})tZ+\forall t\in\mathbb{Z}_{+}.

Definition (Stationary policy)

A deterministic stationary control policy γΓS\gamma\in\Gamma_{S}, s.t. γ:XU\gamma:\mathbb{X}\to \mathbb{U} is a sequence of identical functions {γ,γ,}\{ \gamma,\gamma,\dots \}such that ut=γ(xt), tZ+u_{t}=\gamma(x_{t}), \ \forall t\in\mathbb{Z}_{+}

Theorem (Markov Policy induces Markov Chain)

A γΓM\gamma\in\Gamma_{M} applied to a Controlled Markov Chain induces a process {Xt,t0}\{ X_{t},t\ge0 \} which is Markov.

Cor

If the policy γΓS\gamma\in\Gamma_{S} then (Xt)t0(X_{t})_{t\ge0} is stationary.

Theorem

We have that the policies have the following relationships ΓSΓMΓA\Gamma_{S}\subset\Gamma_{M}\subset\Gamma_{A}and so we also have that {N,β,}\forall \{ N,\beta,\infty \} infγΓAJa(X,γ)infγΓMJa(X,γ)infγΓSJa(X,γ)\inf_{\gamma\in\Gamma_{A}}J_{a}(X,\gamma)\subset\inf_{\gamma\in\Gamma_{M}}J_{a}(X,\gamma)\subset\inf_{\gamma\in\Gamma_{S}}J_{a}(X,\gamma)also for a{N,β,}a\in\{ N,\beta,\infty \} the first is an equality and for a{β,}a\in\{ \beta,\infty \} the second is an equality.

Linked from