Optimality of Q Learning Algorithm

🌱

Theorem

StochasticControl

Theorem

Consider the Q-Learning algorithm and its dynamics. Then: 1. Q-Learning Converges to an Optimal Solution: Under the learning rate assumption, the Q-Learning algorithm converges almost surely to an optimal $Q^{*}$ . 2. Optimal Stationary Policy is Optimal: A Stationary Policy $f^{*}$ which satisfies $\min_{u}Q^{*}(x,u)=Q^{*}(x,f^{*}(x))$ is an optimal policy.