Partially Observable Markov Decision Process

Definition (POMDP)

A Partially Observed Markov Decision Process a.k.a. POMDP, is a seven tuple (X,U,Y,K,T,Q,c)(\mathbb{X}, \mathbb{U}, \mathbb{Y}, \mathbb{K},\mathcal{T}, Q,c) where:

  • X\mathbb{X} is the state space, a subset of a Polish space.
  • U\mathbb{U} is the action space, a subset of a Polish space.
  • Y\mathbb{Y} is the observation space, a subset of a Polish space.
  • K={(x,u):uU(x),xX}\mathbb{K}=\{ (x,u):u\in\mathbb{U}(x),x\in\mathbb{X} \} is the set of state-control pairs that are feasible.
  • T:X×U[0,1]X\mathcal{T}:\mathbb{X}\times \mathbb{U}\to [0,1]^{\lvert\mathbb{X}\rvert} is the state transition kernel i.e. T(Axt,ut)=P(xt+1Axt,ut)\mathcal{T}(A\mid x_{t},u_{t})=P(x_{t+1}\in A\mid x_{t},u_{t})where AB(X)A\in\mathcal{B}(\mathbb{X}).
  • Q:X[0,1]YQ:\mathbb{X}\to [0,1]^{\lvert\mathbb{Y}\rvert} is the observation channel i.e. Q(Axt)=P(ytAxt)Q(A\mid x_{t})=P(y_{t}\in A\mid x_{t})where AB(Y)A\in\mathcal{B}(\mathbb{Y}).
  • c:KRc:\mathbb{K}\to \mathbb{R} is the cost function

Linked from