NAVIGATION
Home
Research
Bookshelf
Garden
FIND ME ON
GitHub
LinkedIn
🌱
Given a policy γ∈ΓA\gamma\in\Gamma_{A}γ∈ΓA , β∈(0,1)\beta\in(0,1)β∈(0,1),with the objective of minimizing Jβ(X,γ)=limN→∞Exγ[∑k=0N−1βkc(Xk,Uk)]J_{\beta}(X,\gamma) = \lim_{ N \to \infty } E_{x}^{\gamma}\left[ \sum_{k=0}^{N-1}\beta^{k}c(X_{k},U_{k}) \right]Jβ(X,γ)=N→∞limExγ[k=0∑N−1βkc(Xk,Uk)]this is known as the Discounted Optimal Control problem.
Q-Learning