FIND ME ON

GitHub

LinkedIn

Markov Decision Process

🌱

Definition
StochasticDiffs

A Fully Observed Markov Control Problem o/w known as a MDP, is a five tuple (X,U,K,T,c)(\mathbb{X}, \mathbb{U},\mathbb{K},\mathcal{T},c) where: - X\mathbb{X} is the state space, a subset of a Polish space. - U\mathbb{U} is the action space, a subset of a Polish space. - K={(x,u):u∈U(x),x∈X}\mathbb{K}=\{ (x,u):u\in\mathbb{U}(x),x\in\mathbb{X} \} is the set of state-control pairs that are feasible. - T\mathcal{T} is the state transition kernel i.e.Ā T(A∣xt,ut)=P(xt+1∈A∣xt,ut)\mathcal{T}(A\mid x_{t},u_{t})=P(x_{t+1}\in A\mid x_{t},u_{t}) - c:K→Rc:\mathbb{K}\to \mathbb{R} is the cost function

Linked from