Kalman Filter

🌱

StochasticControl

Let $X$ be a rv s.t. $X\in L^{2}$ and let $R$ be a positive definite matrix. The following holds: $\inf_{g\in \mathbb{M}(\mathbb{Y})}\mathbb{E}\left[ (X-g(Y))^{\top}R(X-g(Y)) \right]=\mathbb{E}\left[ (X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y]) \right]$ where $\mathbb{M}(\mathbb{Y})$ denotes the set of measurable functions from $\mathbb{Y}$ to $\mathbb{R}$ and where $G(y)=\mathbb{E}[X|Y=y]$ a.s..

\begin{proof} Not a fan of how they explain this but it suffices for this proof to have $G(y)=\mathbb{E}[X|Y=y]+h(y)$ and show that to minimize our expression it is necessary for $h(y)=0$ . So: $\begin{align*} &\mathbb{E}[(X-\mathbb{E}[X|Y]-h(Y))^{\top}R(X-\mathbb{E}[X|Y]-h(Y))]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]+2\mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}Rh(Y)]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]+2\mathbb{E}[\mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}Rh(Y)]\mid Y]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]+2\mathbb{E}\left[\mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}\mid Y]Rh(Y)\right]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]\\ &\ge \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])] \end{align*}$ So in order we: 1. Multiply out 2. Apply Conditional Expectation 3. Apply Conditional Expectation 4. Make note that the new eq has Orthogonal rvs and thus 0 expectation 5. Achieve inequality. \end{proof}

Now we consider the following system: $\begin{align*} x_{t+1}&= Ax_{t}+Bu_{t}+w_{t}&w_{t}\sim \mathcal{N}(0,W)\\ y_{t}&= Cx_{t}+v_{t}&v_{t}\sim \mathcal{N}(0,V) \end{align*}$ where $x\in \mathbb{R}^{n},u\in \mathbb{R}^{m},w\in \mathbb{R}^{n},y\in \mathbb{R}^{p},v\in \mathbb{R}^{p}$ . The goal is to find the optimal cost $\inf_{\gamma \in\Gamma}J(\gamma,\mu_{0})$ where the cost equation is the quadratic cost of the state and the action $J(\mu_{0},\gamma)=\mathbb{E}_{\mu_{0}}^{\gamma}\left[ \sum_{t=0}^{N-1}x_{t}^{\top}Qx_{t}+u_{t}^{\top}Ru_{t}+x_{N}^{\top}Q_{N}x_{N} \right]$ with $R$ positive definite and $Q,Q_{N}$ positive semidefinite, and $\mu_{0}$ the prior on the state which is assumed to be zero-mean Gaussian.

Control-Free Setup

We consider the control-free setup here.

A Gaussian measure with mean $\mu$ and covariance matrix $\Sigma_{XX}$ has the following density $p(x)= \frac{1}{(2\pi)^{n/2}|\Sigma_{XX}|^{1/2}}e^{- \frac{1}{2}((x-\mu)^{\top}\Sigma_{XX}^{-1}(x-\mu))}$

Let $X,Y$ be zero-mean Gaussian vectors. Then, 1. $\mathbb{E}[X|Y=y]$ is linear in y: $\mathbb{E}[X|Y=y]=\Sigma_{XY}\Sigma_{YY}^{-1}y$ and; 2. We have that $\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}]=\Sigma_{XX}-\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{XY}^{\top}=:D$

In particular, $\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}\mid Y=y]$ does not depend on the realization $y$ of $Y$ and is equal to $D$ .

\begin{proof} $(X,Y)$ are Gaussian processes and admit densities $p(x\mid y)= \frac{p(x,y)}{p(y)}$ . With $K_{XY}:=\mathbb{E}\left[\begin{bmatrix}X\\Y\end{bmatrix}\begin{bmatrix}X^{\top}&Y^{\top}\end{bmatrix}\right]$ we have that $K_{XY}:=\begin{bmatrix}\Sigma_{XX} & \Sigma_{XY}\\\Sigma_{YX} & \Sigma_{YY}\end{bmatrix},\quad K_{XY}^{-1}=\begin{bmatrix}\Psi_{XX} & \Psi_{XY}\\\Psi_{YX} & \Psi_{YY}\end{bmatrix}$ Then, $p(x,y)=\frac{1}{|\Sigma_{XY}|(2\pi)^{\frac{n+m}{2}}}e^{-\frac{1}{2}\left(\begin{bmatrix}X\\Y\end{bmatrix}^{\top}K_{XY}^{-1}\begin{bmatrix}X\\Y\end{bmatrix}\right)}$ then we have $\begin{align*} p(x\mid y)&= \frac{1}{(2\pi)^{\frac{n+m}{2}}|\Sigma_{XY}|}e^{- \frac{1}{2}\left( \begin{bmatrix}x^{\top}&y^{\top}\end{bmatrix}K_{XY}^{-1}\begin{bmatrix}x\\y\end{bmatrix} \right) }\cdot \left( \frac{1}{(2\pi)^{\frac{m}{2}}|K_{YY}|}e^{-\frac{1}{2}y^{\top}K_{YY}^{-1}y} \right)^{-1} \\ &= C \frac{e^{-\frac{1}{2}\left( \begin{bmatrix}x^{\top}&y^{\top}\end{bmatrix}\begin{bmatrix}\Psi_{XX} & \Psi_{XY}\\\Psi_{YX} & \Psi_{YY}\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} \right) }}{e^{-\frac{1}{2}y^{\top}K_{YY}^{-1}y}}\\ &= C \frac{e^{-\frac{1}{2}\left( x^{\top}\Psi_{XX}x+2x^{\top}\Psi_{XY}y+y^{\top}\Psi_{YY}y \right) }}{e^{-\frac{1}{2}y^{\top}K_{YY}^{-1}y}}\\ &= C e^{-\frac{1}{2}\left( x^{\top}\Psi_{XX}x+2x^{\top}\Psi_{XY}y+y^{\top}\Psi_{YY}y -y^{\top}K_{YY}^{-1}y\right) } \end{align*}$ Now looking at the expression in the exponent we can apply a completion of squares argument: $\begin{align*} &x^{\top}\Psi_{XX}x+2x^{\top}\Psi_{XY}y+y^{\top}\Psi_{YY}y -y^{\top}K_{YY}^{-1}y\\ &= (x+\Psi_{XX}^{-1}\Psi_{XY}y)^{\top}\Psi_{XX}(x+\Psi_{XX}^{-1}\Psi_{XY}y)+Q(y)\\ &= (x-Hy)^{\top}D^{-1}(x-Hy)+Q(y) \end{align*}$ Then, we observe the following: $\begin{bmatrix}\Psi_{XX} & \Psi_{XY}\\\Psi_{YX} & \Psi_{YY}\end{bmatrix}\cdot\begin{bmatrix}\Sigma_{XX} & \Sigma_{XY}\\\Sigma_{YX} & \Sigma_{YY}\end{bmatrix}=\begin{bmatrix}I&0\\0&I\end{bmatrix}$ which gives us that $\Psi_{XX}\Sigma_{XY}+\Psi_{XY}\Sigma_{YY}=0$ therefore $\begin{align*} \implies\Sigma_{XY}=-\Psi_{XX}^{-1}\Psi_{XY}\Sigma_{YY}\\ \implies\Sigma_{XY}\Sigma_{YY}^{-1}=-\Psi_{XX}^{-1}\Psi_{XY} \end{align*}$ allowing us to re-express $H$ and leave us with the resultant conditional density $p(x\mid y)=Ce^{-\frac{1}{2}Q(y)}e^{-\frac{1}{2}(x-\Sigma_{XY}\Sigma_{YY}^{-1}y)^{\top}\Psi_{XX}(x-\Sigma_{XY}\Sigma_{YY}^{-1}y)}$ Giving us the first condition.

Finally, since $\int\limits p(x\mid y) \, dx=1$ we necessarily have that $Ce^{-\frac{1}{2}Q(y)}=\frac{1}{(2\pi)^{\frac{n}{2}}|D|^{\frac{1}{2}}}$ which is in fact independent of y. Then, we finally have that $D$ which does not depend on $y$ is $\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}\mid Y=y]=\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}]$

\end{proof}

To derive the Kalman Filter the following two lemmas are required: >[!lem|6.2.3] >If $\mathbb{E}[X]=0$ and $Z_{1},Z_{2}$ are orthogonal, zero-mean Gaussian Processes (with $\mathbb{E}[Z^{\top}_{1}Z_{2}]=0$ ), then $\mathbb{E}[X|Z_{1}=z_{1},Z_{2}=z_{2}]=\mathbb{E}[X|Z_{1}=z_{1}]+\mathbb{E}[X|Z_{2}=z_{2}]$

\begin{proof} $\begin{align*} &\mathbb{E}[X\mid(Z_{1},Z_{2})=(z_{1},z_{2})]\\ &= \mathbb{E}\left[X\begin{bmatrix}Z_{1}\\Z_{2}\end{bmatrix}^{\top }\right]\mathbb{E}\left[ \begin{bmatrix}Z_{1}\\Z_{2}\end{bmatrix}\begin{bmatrix}Z_{1}\\Z_{2} \end{bmatrix}^{\top} \right] ^{-1}\begin{bmatrix}z_{1}\\z_{2}\end{bmatrix}\\ &= \mathbb{E}[XZ_{1}^{\top}]\mathbb{E}[XZ_{2}^{\top}]\begin{bmatrix}\mathbb{E}[Z_{1}^{\top}Z_{1}] & \mathbb{E}[Z_{1}^{\top}Z_{2}]\\\mathbb{E}[Z_{2}^{\top}Z_{1}] & \mathbb{E}[Z_{2}^{\top}Z_{2}]\end{bmatrix}^{-1}\begin{bmatrix}z_{1}\\z_{2}\end{bmatrix}\\ &= \mathbb{E}[XZ_{1}^{\top}](\mathbb{E}[Z_{1}Z_{1}^{\top}])^{-1}z_{1}+\mathbb{E}[XZ_{2}^{\top}](\mathbb{E}[Z_{2}Z_{2}^{\top}])^{-1}z_{2}\\ &= \mathbb{E}[X\mid Z_{1}=z_{1}]+\mathbb{E}[X\mid Z_{2}=z_{2}] \end{align*}$ where the third equality is due to orthogonality and the final one is due to \end{proof}

and

$\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}]$ is given by $D$ from above

\begin{proof} First note that $\mathbb{E}[X(\mathbb{E}[X\mid Y])^{\top}]=\mathbb{E}[(X-\mathbb{E}[X\mid Y]+\mathbb{E}[X\mid Y])(\mathbb{E}[X\mid Y]^{\top})]=\mathbb{E}[\mathbb{E}[X\mid Y](\mathbb{E}[X\mid Y])^{\top}]$ since $X-\mathbb{E}[X\mid Y]$ is orthogonal to $\mathbb{E}[X\mid Y]$ (which we know by another Conditional Expectation argument). Then $\begin{align*} \mathbb{E}[(X-\mathbb{E}[X\mid Y])(X-\mathbb{E}[X\mid Y])^{\top}]&= \mathbb{E}[XX^{\top}]-2\mathbb{E}[X(\mathbb{E}[X|Y])^{\top}]+\mathbb{E}[\mathbb{E}[X\mid Y]\mathbb{E}[X\mid Y]^{\top}]\\ &= \mathbb{E}[XX^{\top}]-\mathbb{E}[\mathbb{E}[X\mid Y]\mathbb{E}[X\mid Y]^{\top}]\\ &= \Sigma_{XX}-\mathbb{E}[\Sigma_{XY}\Sigma_{YY}^{-1}yy^{\top}(\Sigma_{YY}^{-1})^{\top}\Sigma_{XY}^{\top}]\\ &= \Sigma_{XX}-\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{XY}^{-1} \end{align*}$ \end{proof}

Now, consider the following system without control: $\begin{align*} x_{t+1}&= Ax_{t}+w_{t}&w_{t}\overset{iid}\sim \mathcal{N}(0,W)\\ y_{t}&= Cx_{t}+v_{t}&v_{t}\overset{iid}\sim \mathcal{N}(0,V) \end{align*}$ where $x\in \mathbb{R}^{n},u\in \mathbb{R}^{m},w\in \mathbb{R}^{n},y\in \mathbb{R}^{p},v\in \mathbb{R}^{p}$ . Define the mean process, $m_{t}$ , and covariance process, $\Sigma_{t|t-1}$ as $\begin{align*} m_{t}&= \mathbb{E}[x_{t}\mid y_{0:t-1}]\\ \Sigma_{t|t-1}&= \mathbb{E}[(x_{t}-\mathbb{E}[x_{t}|y_{0:t-1}])(x_{t}-\mathbb{E}[x_{t}|y_{0:t-1}])^{\top}\mid y_{0:t-1}] \end{align*}$ and since the estimation error covariance does not depend on the realization $y_{0:t-1}$ we can rewrite it as $\Sigma_{t|t-1}=\mathbb{E}[(x_{t}-\mathbb{E}[x_{t}\mid y_{0:t-1}])(x_{t}-\mathbb{E}[x_{t}\mid y_{0:t-1}])^{\top}].$ >[!thm|6.2.1] >The following holds: $\begin{align*} m_{t+1}&= Am_{t}+A\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-Cm_{t})\\ \Sigma_{t+1|t}&= A\Sigma_{t|t-1}A^{\top}+W-(A\Sigma_{t|t-1}C^{\top})(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(C\Sigma_{t|t-1}A^{\top}) \end{align*}$ with $m_{0}=\mathbb{E}[x_{0}],\Sigma_{0|-1}=\mathbb{E}[x_{0}x_{0}^{\top}]$ .

\begin{proof} $\begin{align*} m_{t+1}&= \mathbb{E}[Ax_{t}+w_{t}\mid y_{0:t}]\\ &= \mathbb{E}[Ax_{t}\mid y_{0:t}]\\ &= \mathbb{E}[Am_{t}+A(x_{t}-m_{t})\mid y_{0:t}]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid y_{0:t-1},y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid y_{0:t-1}]\\ &\quad\quad\quad\quad+\mathbb{E}[A(x_{t}-m_{t})\mid y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]]&\text{by lemma 6.2.3}\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})|Cx_{t}+v_{t}-\mathbb{E}[Cx_{t}+v_{t}\mid y_{0:t-1}]]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid C(x_{t}-m_{t})+v_{t}] \end{align*}$ Let $X=A(x_{t}-m_{t})$ and $Y=\mathbb{E}[y_{t}\mid y_{0:t-1}]=y_{t}-Cx_{t}=C(x_{t}-m_{t})+v_{t}$ . Then, by we have $\mathbb{E}[X\mid Y]=\Sigma_{XY}\Sigma_{YY}^{-1}Y$ and thus, $\begin{align*} m_{t+1}&= Am_{t}\\ & +A\mathbb{E}[(x_{t}-m_{t})(x_{t}-m_{t})^{\top}]C^{\top}(\mathbb{E}[(C(x_{t}-m_{t})+v_{t})(C(x_{t}-m_{t})+v_{t})^{\top}])^{-1}(y_{t}-Cx_{t})\\\\ &= Am_{t}+A\Sigma_{t|t-1}C^{\top}(C\mathbb{E}[(x_{t}-m_{t})(x_{t}-m_{t})^{\top}]C^{\top}+\mathbb{E}[v_{t}v_{t}^{\top}])^{-1}(y_{t}-Cx_{t})\\ &= Am_{t}+A\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-Cx_{t}) \end{align*}$ Likewise, $\begin{align*} x_{t+1}-m_{t+1}&= A(x_{t}-m_{t})+w_{t}-A\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-Cx_{t})\\ &= \dots\\ \Sigma_{t+1|t}&= A\Sigma_{t|t-1}A^{\top}+W-(A\Sigma_{t|t-1}C^{\top})(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(C\Sigma_{t|t-1}A^{\top}) \end{align*}$

\end{proof} Define now $\tilde{m}_{t}=\mathbb{E}[x_{t}\mid y_{0:t}]=m_{t}+\mathbb{E}[x_{t}-m_{t}\mid y_{0:t}].$ Following the analysis above we obtain $\tilde{m}_{t}=m_{t}+\mathbb{E}[x_{t}-m_{t}\mid y_{0:t-1}]+\mathbb{E}\left[x_{t}-m_{t}\mid y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]\right]$ Note that we also have $m_{t}=A\tilde{m}_{t-1}$ . The following then results:

The recursions for $\tilde{m}_{t}$ satisfy $\tilde{m}_{t}=A\tilde{m}_{t-1}+\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-CA\tilde{m}_{t-1})$ with $\tilde{m}_{0}=\mathbb{E}[x_{0}\mid y_{0}]$ .

Linked from

Kalman Filter