NAVIGATION
Home
Research
Bookshelf
Garden
FIND ME ON
GitHub
LinkedIn
🌱
Given a policy γ∈ΓA\gamma\in\Gamma_{A}γ∈ΓA with the objective of minimizing JN(X,γ)=Ex0γ[∑k=0N−1c(Xk,Uk)+cN(XN)]=Eγ[∑k=0N−1c(Xk,Uk)+cN(XN)∣X0=X]J_{N}(X,\gamma)=E_{x_{0}}^{\gamma}\left[ \sum_{k=0}^{N-1}c(X_{k},U_{k})+c_{N}(X_{N}) \right]=E^{\gamma}\left[ \sum_{k=0}^{N-1}c(X_{k},U_{k})+c_{N}(X_{N}) |X_{0}=X\right]JN(X,γ)=Ex0γ[k=0∑N−1c(Xk,Uk)+cN(XN)]=Eγ[k=0∑N−1c(Xk,Uk)+cN(XN)∣X0=X]this is known as the Finite Horizon Optimal Control problem.
Bellman's Optimality Principle
Markov Policy is Good Enough
Belief MDP