Definition (Markov Decision Process)
A Fully Observed Markov Control Problem o/w known as a MDP, is a five tuple (X,U,K,T,c) where:
- X is the state space, a subset of a Polish space.
- U is the action space, a subset of a Polish space.
- K={(x,u):u∈U(x),x∈X} is the set of state-control pairs that are feasible.
- T is the state transition kernel i.e. T(A∣xt,ut)=P(xt+1∈A∣xt,ut)
- c:K→R is the cost function