The Hamilton-Jacobi-Bellman equation gives a closed form of the Value Function’s partial derivative w.r.t. time: −∂t∂V(t,X)=u∈Uinf{L(t,X,u)+∂x∂V(t,X)⋅f(X,u,t)}subject to V(t1,X)=Q(X)
Derivation
Consider the dynamics x˙(t)=f(t,x(t),u(t))then let us take the first-order Taylor Expansion around X to get x(t+Δt)=x(t)+f(x(t),u(t),t)Δt+o(Δt)=X+f(X,u(t),t)Δt+o(Δt)where x(t)=X. Assuming the Value Function is C1 we then take the Taylor Expansion of V: V(t+Δt,x(t+Δt))=V(t,X)+∂t∂V(t,X)Δt+∂x∂V(t,X)⋅f(X,u(t),t)Δt+o(Δt)where ∂x∂V(t,X)=∇V=∂x1∂V⋮∂xn∂VFinally, we get by applying the Taylor Expansion again: t∫t+ΔtL(τ,x(τ),u(τ))dτ=L(t,X,u(t))Δt+o(Δt)By substitution we can sub these all in to our result from Decomposition of Value Function: V(t,X)=u[t,t+Δt]inf⎩⎨⎧t∫t+ΔtL(τ,x(τ),u(τ))dτ+V(t+Δt,x(t+Δt))⎭⎬⎫=u[t,t+Δt]inf{L(t,X,u(t))Δt+V(t,X)+∂t∂V(t,X)Δt+∂X∂V(t,X)⋅f(X,u(t),t)Δt+o(Δt)}As a result we have 0=Δt→0limΔt1u[t,t+Δt]inf{L(t,X,u(t))Δt+∂t∂V(t,X)Δt+∂X∂V(t,X)⋅f(X,u(t),t)⋅Δt+o(Δt)}Note that first ∂t∂V(t,X) does not depend on the infimum so we can pull it out. Next notice that by taking the limit the control is evaluated solely at time t so we get: −∂t∂V(t,X)=u∈Uinf{L(t,X,u)+∂X∂V(t,X)⋅f(X,u,t)}
Theorem (Sufficient condition for optimality)
Suppose that a C1 function V^:[t0,t1]×Rn→R satisfies the HJB equation: −∂t∂V(t,X)=u∈Uinf{L(t,X,u)+∂x∂V(t,X)⋅f(X,u,t)}∀t∈[t0,t1]where X∈Rn and V^(t1,X)=Q(X)We further assume that there exists a control input u^:[t0,t1]→U with the corresponding trajectory x^:[t0,t1]→Rn satisfying x^(t0)=x0 such that L(t,x^(t),u^(t))+∂x∂V^(t,x^(t))⋅f(x^(t),u^(t),t)=u∈Umin{L(t,x^(t),u)+∂x∂V^(t,x^(t))⋅f(x^(t),u,t)}∀t∈[t0,t1]. Then V^(t0,x0) is the optimal cost and u^ is the optimal control.