š±
Consider the Q-Learning algorithm and its dynamics. Then: 1. Q-Learning Converges to an Optimal Solution: Under the learning rate assumption, the Q-Learning algorithm converges almost surely to an optimal . 2. Optimal Stationary Policy is Optimal: A Stationary Policy which satisfies is an optimal policy.