Hamilton-Jacobi-Bellman Equation

Definition (HJB Equation)

The Hamilton-Jacobi-Bellman equation gives a closed form of the Value Function’s partial derivative w.r.t. time: Vt(t,X)=infuU{L(t,X,u)+Vx(t,X)f(X,u,t)}-\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}subject to V(t1,X)=Q(X)V(t_{1},\mathbf{X})=Q(\mathbf{X})

Derivation

Consider the dynamics x˙(t)=f(t,x(t),u(t))\dot{x}(t)=f(t,x(t),u(t))then let us take the first-order Taylor Expansion around X\mathbf{X} to get x(t+Δt)=x(t)+f(x(t),u(t),t)Δt+o(Δt)=X+f(X,u(t),t)Δt+o(Δt)\begin{align*} x(t+\Delta t)&= x(t)+f(x(t),u(t),t)\Delta t+o(\Delta t)\\ &= \mathbf{X}+f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \end{align*}where x(t)=Xx(t)=\mathbf{X}. Assuming the Value Function is C1C^{1} we then take the Taylor Expansion of VV: V(t+Δt,x(t+Δt))=V(t,X)+Vt(t,X)Δt+Vx(t,X)f(X,u(t),t)Δt+o(Δt)\begin{align*} V(t+\Delta t,x(t+\Delta t))&= V(t,\mathbf{X})+\frac{ \partial V }{ \partial t } (t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \end{align*}where Vx(t,X)=V=[Vx1Vxn]\frac{ \partial V }{ \partial x } (t,\mathbf{X})=\nabla V=\begin{bmatrix}\frac{ \partial V }{ \partial x_{1} } \\ \vdots\\\frac{ \partial V }{ \partial x_{n} } \end{bmatrix}Finally, we get by applying the Taylor Expansion again: tt+ΔtL(τ,x(τ),u(τ))dτ=L(t,X,u(t))Δt+o(Δt)\int\limits _{t}^{t+\Delta t}L(\tau,x(\tau),u(\tau)) \, d\tau =L(t,\mathbf{X},u(t))\Delta t+o(\Delta t)By substitution we can sub these all in to our result from Decomposition of Value Function: V(t,X)=infu[t,t+Δt]{tt+ΔtL(τ,x(τ),u(τ))dτ+V(t+Δt,x(t+Δt))}=infu[t,t+Δt]{L(t,X,u(t))Δt+V(t,X)+Vt(t,X)Δt+VX(t,X)f(X,u(t),t)Δt+o(Δt)}\begin{align*} V(t,\mathbf{X})&= \inf_{u_{[t,t+\Delta t]}}\left\{ \int\limits _{t}^{t+\Delta t}L(\tau,x(\tau),u(\tau)) \, d\tau +V(t+\Delta t,x(t+\Delta t)) \right\}\\ &= \inf_{u_{[t,t+\Delta t]}}\left\{ L(t,\mathbf{X},u(t))\Delta t+V(t,\mathbf{X})+\frac{ \partial V }{ \partial t }(t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \right\} \end{align*}As a result we have 0=limΔt01Δtinfu[t,t+Δt]{L(t,X,u(t))Δt+Vt(t,X)Δt+VX(t,X)f(X,u(t),t)Δt+o(Δt)}0=\lim_{ \Delta t \to 0 } \frac{1}{\Delta t}\inf_{u_{[t,t+\Delta t]}}\left\{ L(t,\mathbf{X},u(t))\Delta t+\frac{ \partial V }{ \partial t } (t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\cdot\Delta t+o(\Delta t) \right\}Note that first Vt(t,X)\frac{ \partial V }{ \partial t }(t,\mathbf{X}) does not depend on the infimum so we can pull it out. Next notice that by taking the limit the control is evaluated solely at time tt so we get: Vt(t,X)=infuU{L(t,X,u)+VX(t,X)f(X,u,t)}-\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}

Theorem (Sufficient condition for optimality)

Suppose that a C1C^{1} function V^:[t0,t1]×RnR\hat{V}:[t_{0},t_{1}]\times \mathbb{R}^{n}\to \mathbb{R} satisfies the HJB equation: Vt(t,X)=infuU{L(t,X,u)+Vx(t,X)f(X,u,t)}t[t0,t1]-\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}\quad\forall t\in[t_{0},t_{1}]where XRn\mathbf{X}\in\mathbb{R}^{n} and V^(t1,X)=Q(X)\hat{V}(t_{1},\mathbf{X})=Q(\mathbf{X})We further assume that there exists a control input u^:[t0,t1]U\hat{u}:[t_{0},t_{1}]\to \mathcal{U} with the corresponding trajectory x^:[t0,t1]Rn\hat{x}:[t_{0},t_{1}]\to \mathbb{R}^{n} satisfying x^(t0)=x0\hat{x}(t_{0})=x_{0} such that L(t,x^(t),u^(t))+V^x(t,x^(t))f(x^(t),u^(t),t)=minuU{L(t,x^(t),u)+V^x(t,x^(t))f(x^(t),u,t)}\begin{align*} &L(t,\hat{x}(t),\hat{u}(t))+\frac{ \partial \hat{V} }{ \partial x } (t,\hat{x}(t))\cdot f(\hat{x}(t),\hat{u}(t),t)\\ &= \min_{u\in\mathcal{U}}\left\{ L(t,\hat{x}(t),u)+\frac{ \partial \hat{V} }{ \partial x } (t,\hat{x}(t))\cdot f(\hat{x}(t),u,t) \right\} \end{align*}t[t0,t1]\forall t\in[t_{0},t_{1}]. Then V^(t0,x0)\hat{V}(t_{0},x_{0}) is the optimal cost and u^\hat{u} is the optimal control.

Linked from