Let H0:=X, Ht=Ht−1×K, for t=1,2,…. We let It denote an element of Ht where It={x[0,t],u[0,t−1]}.A deterministic admissible control policy γ is a sequence of functions {γt,t∈Z+} such that γ:Ht→U with ut=γt(It)
We can also state this as follows: > [!remark|*] Alternate Definition >Let us write that ut is a realization of the action random variable Ut under an admissible policy, and we would like to also emphasize that Ht is a random variable with realization It. We say that γt is a measurable function on σ(Ht) in the sense that for every Borel subset B⊂U we have that {ω:Ut(ω)∈B}=Ut−1(B)⊂σ(Ht)
A randomized admissible control policy is a sequence γ={γt,t≥0} such that γ:Ht→P(U) with P(U) being the set of probability measures on U, so that for every realization It, we have that γt(It) is a probability measure on U. By Stochastic Realization arguments this is equivalent to writing ut=γt(It,rt)for some [0,1]-valued i.i.d. random variable rt.