FIND ME ON

GitHub

LinkedIn

Optimality of Q Learning Algorithm

🌱

Theorem
StochasticControl

Theorem

Consider the Q-Learning algorithm and its dynamics. Then: 1. Q-Learning Converges to an Optimal Solution: Under the learning rate assumption, the Q-Learning algorithm converges almost surely to an optimal Qāˆ—Q^{*}. 2. Optimal Stationary Policy is Optimal: A Stationary Policy fāˆ—f^{*} which satisfies min⁔uQāˆ—(x,u)=Qāˆ—(x,fāˆ—(x))\min_{u}Q^{*}(x,u)=Q^{*}(x,f^{*}(x)) is an optimal policy.