FIND ME ON

GitHub

LinkedIn

Hamilton-Jacobi-Bellman Equation

🌱

Control

The Hamilton-Jacobi-Bellman equation gives a closed form of the Value Function’s partial derivative w.r.t. time: āˆ’āˆ‚Vāˆ‚t(t,X)=inf⁔u∈U{L(t,X,u)+āˆ‚Vāˆ‚x(t,X)ā‹…f(X,u,t)}-\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}subject to V(t1,X)=Q(X)V(t_{1},\mathbf{X})=Q(\mathbf{X}) ## Derivation Consider the dynamics xĖ™(t)=f(t,x(t),u(t))\dot{x}(t)=f(t,x(t),u(t))then let us take the first-order Taylor Expansion around X\mathbf{X} to get x(t+Ī”t)=x(t)+f(x(t),u(t),t)Ī”t+o(Ī”t)=X+f(X,u(t),t)Ī”t+o(Ī”t)\begin{align*} x(t+\Delta t)&= x(t)+f(x(t),u(t),t)\Delta t+o(\Delta t)\\ &= \mathbf{X}+f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \end{align*}where x(t)=Xx(t)=\mathbf{X}. Assuming the Value Function is C1C^{1} we then take the Taylor expansion of VV: V(t+Ī”t,x(t+Ī”t))=V(t,X)+āˆ‚Vāˆ‚t(t,X)Ī”t+āˆ‚Vāˆ‚x(t,X)ā‹…f(X,u(t),t)Ī”t+o(Ī”t)\begin{align*} V(t+\Delta t,x(t+\Delta t))&= V(t,\mathbf{X})+\frac{ \partial V }{ \partial t } (t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \end{align*}where āˆ‚Vāˆ‚x(t,X)=āˆ‡V=[āˆ‚Vāˆ‚x1ā‹®āˆ‚Vāˆ‚xn]\frac{ \partial V }{ \partial x } (t,\mathbf{X})=\nabla V=\begin{bmatrix}\frac{ \partial V }{ \partial x_{1} } \\ \vdots\\\frac{ \partial V }{ \partial x_{n} } \end{bmatrix}Finally, we get by applying the Taylor expansion again: ∫tt+Ī”tL(Ļ„,x(Ļ„),u(Ļ„)) dĻ„=L(t,X,u(t))Ī”t+o(Ī”t)\int\limits _{t}^{t+\Delta t}L(\tau,x(\tau),u(\tau)) \, d\tau =L(t,\mathbf{X},u(t))\Delta t+o(\Delta t)By substitution we can sub these all in to our result from Value Function: V(t,X)=inf⁔u[t,t+Ī”t]{∫tt+Ī”tL(Ļ„,x(Ļ„),u(Ļ„)) dĻ„+V(t+Ī”t,x(t+Ī”t))}=inf⁔u[t,t+Ī”t]{L(t,X,u(t))Ī”t+V(t,X)+āˆ‚Vāˆ‚t(t,X)Ī”t+āˆ‚Vāˆ‚X(t,X)ā‹…f(X,u(t),t)Ī”t+o(Ī”t)}\begin{align*} V(t,\mathbf{X})&= \inf_{u_{[t,t+\Delta t]}}\left\{ \int\limits _{t}^{t+\Delta t}L(\tau,x(\tau),u(\tau)) \, d\tau +V(t+\Delta t,x(t+\Delta t)) \right\}\\ &= \inf_{u_{[t,t+\Delta t]}}\left\{ L(t,\mathbf{X},u(t))\Delta t+V(t,\mathbf{X})+\frac{ \partial V }{ \partial t }(t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\Delta t+o(\Delta t) \right\} \end{align*}As a result we have 0=lim⁔Δt→01Ī”tinf⁔u[t,t+Ī”t]{L(t,X,u(t))Ī”t+āˆ‚Vāˆ‚t(t,X)Ī”t+āˆ‚Vāˆ‚X(t,X)ā‹…f(X,u(t),t)ā‹…Ī”t+o(Ī”t)}0=\lim_{ \Delta t \to 0 } \frac{1}{\Delta t}\inf_{u_{[t,t+\Delta t]}}\left\{ L(t,\mathbf{X},u(t))\Delta t+\frac{ \partial V }{ \partial t } (t,\mathbf{X})\Delta t+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},u(t),t)\cdot\Delta t+o(\Delta t) \right\}Note that first āˆ‚Vāˆ‚t(t,X)\frac{ \partial V }{ \partial t }(t,\mathbf{X}) does not depend on the infimum so we can pull it out. Next notice that by taking the limit the control is evaluated solely at time tt so we get: āˆ’āˆ‚Vāˆ‚t(t,X)=inf⁔u∈U{L(t,X,u)+āˆ‚Vāˆ‚X(t,X)ā‹…f(X,u,t)}-\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial \mathbf{X} } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}

Suppose that a C1C^{1} function V^:[t0,t1]ƗRn→R\hat{V}:[t_{0},t_{1}]\times \mathbb{R}^{n}\to \mathbb{R} satisfies the HJB equation: āˆ’āˆ‚Vāˆ‚t(t,X)=inf⁔u∈U{L(t,X,u)+āˆ‚Vāˆ‚x(t,X)ā‹…f(X,u,t)}āˆ€t∈[t0,t1]-\frac{ \partial V }{ \partial t } (t,\mathbf{X})=\inf_{\mathbf{u}\in\mathcal{U}}\left\{ L(t,\mathbf{X},\mathbf{u})+\frac{ \partial V }{ \partial x } (t,\mathbf{X})\cdot f(\mathbf{X},\mathbf{u},t) \right\}\quad\forall t\in[t_{0},t_{1}]where X∈Rn\mathbf{X}\in\mathbb{R}^{n} and V^(t1,X)=Q(X)\hat{V}(t_{1},\mathbf{X})=Q(\mathbf{X})We further assume that there exists a control input u^:[t0,t1]→U\hat{u}:[t_{0},t_{1}]\to \mathcal{U} with the corresponding trajectory x^:[t0,t1]→Rn\hat{x}:[t_{0},t_{1}]\to \mathbb{R}^{n} satisfying x^(t0)=x0\hat{x}(t_{0})=x_{0} such that L(t,x^(t),u^(t))+āˆ‚V^āˆ‚x(t,x^(t))ā‹…f(x^(t),u^(t),t)=min⁔u∈U{L(t,x^(t),u)+āˆ‚V^āˆ‚x(t,x^(t))ā‹…f(x^(t),u,t)}\begin{align*} &L(t,\hat{x}(t),\hat{u}(t))+\frac{ \partial \hat{V} }{ \partial x } (t,\hat{x}(t))\cdot f(\hat{x}(t),\hat{u}(t),t)\\ &= \min_{u\in\mathcal{U}}\left\{ L(t,\hat{x}(t),u)+\frac{ \partial \hat{V} }{ \partial x } (t,\hat{x}(t))\cdot f(\hat{x}(t),u,t) \right\} \end{align*}āˆ€t∈[t0,t1]\forall t\in[t_{0},t_{1}]. Then V^(t0,x0)\hat{V}(t_{0},x_{0}) is the optimal cost and u^\hat{u} is the optimal control.

Linked from