Definition (Policy)
A policy is
Definition (Deterministic Admissible Policy)
Let H0:=X, Ht=Ht−1×K, for t=1,2,…. We let It denote an element of Ht where It={x[0,t],u[0,t−1]}.A deterministic admissible control policy γ is a sequence of functions {γt,t∈Z+} such that γ:Ht→U with ut=γt(It)
We can also state this as follows: > [!remark|*] Alternate Definition >Let us write that ut is a realization of the action Random Variable Ut under an admissible policy, and we would like to also emphasize that Ht is a Random Variable with realization It. We say that γt is a Measurable Function on σ(Ht) in the sense that for every Borel subset B⊂U we have that {ω:Ut(ω)∈B}=Ut−1(B)⊂σ(Ht)
Definition (Randomized Admissible Control Policy)
A randomized admissible control policy is a sequence γ={γt,t≥0} such that γ:Ht→P(U) with P(U) being the set of probability measures on U, so that for every realization It, we have that γt(It) is a Probability Measure on U. By Stochastic Realization arguments this is equivalent to writing ut=γt(It,rt)for some [0,1]-valued i.i.d. Random Variable rt.
Definition (Markov policy)
A deterministic Markov control policy γ∈ΓM is a sequence of functions {γt} s.t. γt:X×Z+→Uand ut=γt(xt)∀t∈Z+.
Definition (Stationary policy)
A deterministic stationary control policy γ∈ΓS, s.t. γ:X→U is a sequence of identical functions {γ,γ,…}such that ut=γ(xt), ∀t∈Z+
Theorem (Markov Policy induces Markov Chain)
A γ∈ΓM applied to a Controlled Markov Chain induces a process {Xt,t≥0} which is Markov.
Cor
If the policy γ∈ΓS then (Xt)t≥0 is stationary.
Theorem
We have that the policies have the following relationships ΓS⊂ΓM⊂ΓAand so we also have that ∀{N,β,∞} γ∈ΓAinfJa(X,γ)⊂γ∈ΓMinfJa(X,γ)⊂γ∈ΓSinfJa(X,γ)also for a∈{N,β,∞} the first is an equality and for a∈{β,∞} the second is an equality.