Policy

NAVIGATION

Home

Research

Bookshelf

Garden

FIND ME ON

GitHub

Home

Research

Bookshelf

Garden

Definition (Policy)

A policy is

Definition (Deterministic Admissible Policy)

Let $\mathbb{H}_{0}:=\mathbb{X}$ , $\mathbb{H}_{t}=\mathbb{H}_{t-1}\times \mathbb{K}$ , for $t=1,2,\dots$ . We let $I_{t}$ denote an element of $\mathbb{H}_{t}$ where $I_{t}=\{ x_{[0,t]},u_{[0,t-1]} \}.$ A deterministic admissible control policy $\gamma$ is a sequence of functions $\{\gamma_{t},t\in\mathbb{Z}_{+}\}$ such that $\gamma:\mathbb{H}_{t}\to \mathbb{U}$ with $u_{t}=\gamma_{t}(I_{t})$

We can also state this as follows: > [!remark|*] Alternate Definition >Let us write that $u_{t}$ is a realization of the action Random Variable $U_{t}$ under an admissible policy, and we would like to also emphasize that $H_{t}$ is a Random Variable with realization $I_{t}$ . We say that $\gamma_{t}$ is a Measurable Function on $\sigma(H_{t})$ in the sense that for every Borel subset $B\subset \mathbb{U}$ we have that $\{ \omega :U_{t}(\omega)\in B \}=U_{t}^{-1}(B)\subset\sigma(H_{t})$

Definition (Randomized Admissible Control Policy)

A randomized admissible control policy is a sequence $\gamma=\{ \gamma_{t},t\ge 0 \}$ such that $\gamma:\mathbb{H}_{t}\to \mathcal{P}(\mathbb{U})$ with $\mathcal{P}(\mathbb{U})$ being the set of probability measures on $\mathbb{U}$ , so that for every realization $I_{t}$ , we have that $\gamma_{t}(I_{t})$ is a Probability Measure on $\mathbb{U}$ . By Stochastic Realization arguments this is equivalent to writing $u_{t}=\gamma_{t}(I_{t},r_{t})$ for some $[0,1]$ -valued i.i.d. Random Variable $r_{t}$ .

Definition (Markov policy)

A deterministic Markov control policy $\gamma\in\Gamma_{M}$ is a sequence of functions $\{ \gamma_{t} \}$ s.t. $\gamma_{t}:\mathbb{X}\times \mathbb{Z}_{+}\to \mathbb{U}$ and $u_{t}=\gamma_{t}(x_{t})$ $\forall t\in\mathbb{Z}_{+}$ .

Definition (Stationary policy)

A deterministic stationary control policy $\gamma\in\Gamma_{S}$ , s.t. $\gamma:\mathbb{X}\to \mathbb{U}$ is a sequence of identical functions $\{ \gamma,\gamma,\dots \}$ such that $u_{t}=\gamma(x_{t}), \ \forall t\in\mathbb{Z}_{+}$

Theorem (Markov Policy induces Markov Chain)

A $\gamma\in\Gamma_{M}$ applied to a Controlled Markov Chain induces a process $\{ X_{t},t\ge0 \}$ which is Markov.

Cor

If the policy $\gamma\in\Gamma_{S}$ then $(X_{t})_{t\ge0}$ is stationary.

Theorem

We have that the policies have the following relationships $\Gamma_{S}\subset\Gamma_{M}\subset\Gamma_{A}$ and so we also have that $\forall \{ N,\beta,\infty \}$ $\inf_{\gamma\in\Gamma_{A}}J_{a}(X,\gamma)\subset\inf_{\gamma\in\Gamma_{M}}J_{a}(X,\gamma)\subset\inf_{\gamma\in\Gamma_{S}}J_{a}(X,\gamma)$ also for $a\in\{ N,\beta,\infty \}$ the first is an equality and for $a\in\{ \beta,\infty \}$ the second is an equality.

Linked from

Average Cost problem

Blackwell's Irrelevant Information Theorem

Controlled Markov Chain

Discounted Infinite Horizon Optimization

Finite Horizon Optimization

Linear Quadratic Problem

Optimal

Q-Learning

Time Consistent

Nash Equilibrium

Team-Optimal Solution

Witsenhausen's Intrinsic Model

Ionescu Tulcea Theorem