The Hamilton-Jacobi-Bellman equation gives a closed form of the Value Function ās partial derivative w.r.t. time: ā ā V ā t ( t , X ) = inf ā” u ā U { L ( t , X , u ) + ā V ā x ( t , X ) ā
f ( X , u , t ) } -\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\} ā ā t ā V ā ( t , X ) = u ā U inf ā { L ( t , X , u ) + ā x ā V ā ( t , X ) ā
f ( X , u , t ) } subject to V ( t 1 , X ) = Q ( X ) V(t_{1},\mathbf{X})=Q(\mathbf{X}) V ( t 1 ā , X ) = Q ( X ) ## Derivation Consider the dynamics x Ė ( t ) = f ( t , x ( t ) , u ( t ) ) \dot{x}(t)=f(t,x(t),u(t)) x Ė ( t ) = f ( t , x ( t ) , u ( t )) then let us take the first-order Taylor Expansion around X \mathbf{X} X to get x ( t + Ī t ) = x ( t ) + f ( x ( t ) , u ( t ) , t ) Ī t + o ( Ī t ) = X + f ( X , u ( t ) , t ) Ī t + o ( Ī t ) \begin{align*}
x(t+\Delta t)&= x(t)+f(x(t),u(t),t)\Delta t+o(\Delta t)\\
&= \mathbf{X}+f(\mathbf{X},u(t),t)\Delta t+o(\Delta t)
\end{align*} x ( t + Ī t ) ā = x ( t ) + f ( x ( t ) , u ( t ) , t ) Ī t + o ( Ī t ) = X + f ( X , u ( t ) , t ) Ī t + o ( Ī t ) ā where x ( t ) = X x(t)=\mathbf{X} x ( t ) = X . Assuming the Value Function is C 1 C^{1} C 1 we then take the Taylor expansion of V V V : V ( t + Ī t , x ( t + Ī t ) ) = V ( t , X ) + ā V ā t ( t , X ) Ī t + ā V ā x ( t , X ) ā
f ( X , u ( t ) , t ) Ī t + o ( Ī t ) \begin{align*}
V(t+\Delta t,x(t+\Delta t))&= V(t,\mathbf{X})+\frac{ \partial V }{ \partial t } (t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\Delta t+o(\Delta t)
\end{align*} V ( t + Ī t , x ( t + Ī t )) ā = V ( t , X ) + ā t ā V ā ( t , X ) Ī t + ā x ā V ā ( t , X ) ā
f ( X , u ( t ) , t ) Ī t + o ( Ī t ) ā where ā V ā x ( t , X ) = ā V = [ ā V ā x 1 ā® ā V ā x n ] \frac{ \partial V }{ \partial x } (t,\mathbf{X})=\nabla V=\begin{bmatrix}\frac{ \partial V }{ \partial x_{1} } \\ \vdots\\\frac{ \partial V }{ \partial x_{n} } \end{bmatrix} ā x ā V ā ( t , X ) = ā V = ā ā x 1 ā ā V ā ā® ā x n ā ā V ā ā ā Finally, we get by applying the Taylor expansion again: ā« t t + Ī t L ( Ļ , x ( Ļ ) , u ( Ļ ) ) ā d Ļ = L ( t , X , u ( t ) ) Ī t + o ( Ī t ) \int\limits _{t}^{t+\Delta t}L(\tau,x(\tau),u(\tau)) \, d\tau =L(t,\mathbf{X},u(t))\Delta t+o(\Delta t) t ā« t + Ī t ā L ( Ļ , x ( Ļ ) , u ( Ļ )) d Ļ = L ( t , X , u ( t )) Ī t + o ( Ī t ) By substitution we can sub these all in to our result from Value Function : V ( t , X ) = inf ā” u [ t , t + Ī t ] { ā« t t + Ī t L ( Ļ , x ( Ļ ) , u ( Ļ ) ) ā d Ļ + V ( t + Ī t , x ( t + Ī t ) ) } = inf ā” u [ t , t + Ī t ] { L ( t , X , u ( t ) ) Ī t + V ( t , X ) + ā V ā t ( t , X ) Ī t + ā V ā X ( t , X ) ā
f ( X , u ( t ) , t ) Ī t + o ( Ī t ) } \begin{align*}
V(t,\mathbf{X})&= \inf_{u_{[t,t+\Delta t]}}\left\{ \int\limits _{t}^{t+\Delta t}L(\tau,x(\tau),u(\tau)) \, d\tau +V(t+\Delta t,x(t+\Delta t)) \right\}\\
&= \inf_{u_{[t,t+\Delta t]}}\left\{ L(t,\mathbf{X},u(t))\Delta t+V(t,\mathbf{X})+\frac{ \partial V }{ \partial t }(t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \right\}
\end{align*} V ( t , X ) ā = u [ t , t + Ī t ] ā inf ā ⩠⨠⧠ā t ā« t + Ī t ā L ( Ļ , x ( Ļ ) , u ( Ļ )) d Ļ + V ( t + Ī t , x ( t + Ī t )) ā ⬠⫠ā = u [ t , t + Ī t ] ā inf ā { L ( t , X , u ( t )) Ī t + V ( t , X ) + ā t ā V ā ( t , X ) Ī t + ā X ā V ā ( t , X ) ā
f ( X , u ( t ) , t ) Ī t + o ( Ī t ) } ā As a result we have 0 = lim ā” Ī t ā 0 1 Ī t inf ā” u [ t , t + Ī t ] { L ( t , X , u ( t ) ) Ī t + ā V ā t ( t , X ) Ī t + ā V ā X ( t , X ) ā
f ( X , u ( t ) , t ) ā
Ī t + o ( Ī t ) } 0=\lim_{ \Delta t \to 0 } \frac{1}{\Delta t}\inf_{u_{[t,t+\Delta t]}}\left\{ L(t,\mathbf{X},u(t))\Delta t+\frac{ \partial V }{ \partial t } (t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\cdot\Delta t+o(\Delta t) \right\} 0 = Ī t ā 0 lim ā Ī t 1 ā u [ t , t + Ī t ] ā inf ā { L ( t , X , u ( t )) Ī t + ā t ā V ā ( t , X ) Ī t + ā X ā V ā ( t , X ) ā
f ( X , u ( t ) , t ) ā
Ī t + o ( Ī t ) } Note that first ā V ā t ( t , X ) \frac{ \partial V }{ \partial t }(t,\mathbf{X}) ā t ā V ā ( t , X ) does not depend on the infimum so we can pull it out. Next notice that by taking the limit the control is evaluated solely at time t t t so we get: ā ā V ā t ( t , X ) = inf ā” u ā U { L ( t , X , u ) + ā V ā X ( t , X ) ā
f ( X , u , t ) } -\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\} ā ā t ā V ā ( t , X ) = u ā U inf ā { L ( t , X , u ) + ā X ā V ā ( t , X ) ā
f ( X , u , t ) }
Suppose that a C 1 C^{1} C 1 function V ^ : [ t 0 , t 1 ] Ć R n ā R \hat{V}:[t_{0},t_{1}]\times \mathbb{R}^{n}\to \mathbb{R} V ^ : [ t 0 ā , t 1 ā ] Ć R n ā R satisfies the HJB equation : ā ā V ā t ( t , X ) = inf ā” u ā U { L ( t , X , u ) + ā V ā x ( t , X ) ā
f ( X , u , t ) } ā t ā [ t 0 , t 1 ] -\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}\quad\forall t\in[t_{0},t_{1}] ā ā t ā V ā ( t , X ) = u ā U inf ā { L ( t , X , u ) + ā x ā V ā ( t , X ) ā
f ( X , u , t ) } ā t ā [ t 0 ā , t 1 ā ] where X ā R n \mathbf{X}\in\mathbb{R}^{n} X ā R n and V ^ ( t 1 , X ) = Q ( X ) \hat{V}(t_{1},\mathbf{X})=Q(\mathbf{X}) V ^ ( t 1 ā , X ) = Q ( X ) We further assume that there exists a control input u ^ : [ t 0 , t 1 ] ā U \hat{u}:[t_{0},t_{1}]\to \mathcal{U} u ^ : [ t 0 ā , t 1 ā ] ā U with the corresponding trajectory x ^ : [ t 0 , t 1 ] ā R n \hat{x}:[t_{0},t_{1}]\to \mathbb{R}^{n} x ^ : [ t 0 ā , t 1 ā ] ā R n satisfying x ^ ( t 0 ) = x 0 \hat{x}(t_{0})=x_{0} x ^ ( t 0 ā ) = x 0 ā such that L ( t , x ^ ( t ) , u ^ ( t ) ) + ā V ^ ā x ( t , x ^ ( t ) ) ā
f ( x ^ ( t ) , u ^ ( t ) , t ) = min ā” u ā U { L ( t , x ^ ( t ) , u ) + ā V ^ ā x ( t , x ^ ( t ) ) ā
f ( x ^ ( t ) , u , t ) } \begin{align*}
&L(t,\hat{x}(t),\hat{u}(t))+\frac{ \partial \hat{V} }{ \partial x } (t,\hat{x}(t))\cdot f(\hat{x}(t),\hat{u}(t),t)\\
&= \min_{u\in\mathcal{U}}\left\{ L(t,\hat{x}(t),u)+\frac{ \partial \hat{V} }{ \partial x } (t,\hat{x}(t))\cdot f(\hat{x}(t),u,t) \right\}
\end{align*} ā L ( t , x ^ ( t ) , u ^ ( t )) + ā x ā V ^ ā ( t , x ^ ( t )) ā
f ( x ^ ( t ) , u ^ ( t ) , t ) = u ā U min ā { L ( t , x ^ ( t ) , u ) + ā x ā V ^ ā ( t , x ^ ( t )) ā
f ( x ^ ( t ) , u , t ) } ā ā t ā [ t 0 , t 1 ] \forall t\in[t_{0},t_{1}] ā t ā [ t 0 ā , t 1 ā ] . Then V ^ ( t 0 , x 0 ) \hat{V}(t_{0},x_{0}) V ^ ( t 0 ā , x 0 ā ) is the optimal cost and u ^ \hat{u} u ^ is the optimal control .