Markov Decision Process

Definition (Markov Decision Process)

A Fully Observed Markov Control Problem o/w known as a MDP, is a five tuple (X,U,K,T,c)(\mathbb{X}, \mathbb{U},\mathbb{K},\mathcal{T},c) where:

  • X\mathbb{X} is the state space, a subset of a Polish space.
  • U\mathbb{U} is the action space, a subset of a Polish space.
  • K={(x,u):uU(x),xX}\mathbb{K}=\{ (x,u):u\in\mathbb{U}(x),x\in\mathbb{X} \} is the set of state-control pairs that are feasible.
  • T\mathcal{T} is the state transition kernel i.e. T(Axt,ut)=P(xt+1Axt,ut)\mathcal{T}(A\mid x_{t},u_{t})=P(x_{t+1}\in A\mid x_{t},u_{t})
  • c:KRc:\mathbb{K}\to \mathbb{R} is the cost function

Linked from