Definition (POMDP)
A Partially Observed Markov Decision Process a.k.a. POMDP, is a seven tuple (X,U,Y,K,T,Q,c) where:
- X is the state space, a subset of a Polish space.
- U is the action space, a subset of a Polish space.
- Y is the observation space, a subset of a Polish space.
- K={(x,u):u∈U(x),x∈X} is the set of state-control pairs that are feasible.
- T:X×U→[0,1]∣X∣ is the state transition kernel i.e. T(A∣xt,ut)=P(xt+1∈A∣xt,ut)where A∈B(X).
- Q:X→[0,1]∣Y∣ is the observation channel i.e. Q(A∣xt)=P(yt∈A∣xt)where A∈B(Y).
- c:K→R is the cost function