FIND ME ON

GitHub

LinkedIn

Kalman Filter

🌱

StochasticControl

Let XX be a rv s.t. X∈L2X\in L^{2} and let RR be a positive definite matrix. The following holds: inf⁔g∈M(Y)E[(Xāˆ’g(Y))⊤R(Xāˆ’g(Y))]=E[(Xāˆ’E[X∣Y])⊤R(Xāˆ’E[X∣Y])]\inf_{g\in \mathbb{M}(\mathbb{Y})}\mathbb{E}\left[ (X-g(Y))^{\top}R(X-g(Y)) \right]=\mathbb{E}\left[ (X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y]) \right] where M(Y)\mathbb{M}(\mathbb{Y}) denotes the set of measurable functions from Y\mathbb{Y} to R\mathbb{R} and where G(y)=E[X∣Y=y]G(y)=\mathbb{E}[X|Y=y] a.s..

\begin{proof} Not a fan of how they explain this but it suffices for this proof to have G(y)=E[X∣Y=y]+h(y)G(y)=\mathbb{E}[X|Y=y]+h(y) and show that to minimize our expression it is necessary for h(y)=0h(y)=0. So: E[(Xāˆ’E[X∣Y]āˆ’h(Y))⊤R(Xāˆ’E[X∣Y]āˆ’h(Y))]=E[(Xāˆ’E[X∣Y])⊤R(Xāˆ’E[X∣Y])]+E[h⊤(Y)Rh(Y)]+2E[(Xāˆ’E[X∣Y])⊤Rh(Y)]=E[(Xāˆ’E[X∣Y])⊤R(Xāˆ’E[X∣Y])]+E[h⊤(Y)Rh(Y)]+2E[E[(Xāˆ’E[X∣Y])⊤Rh(Y)]∣Y]=E[(Xāˆ’E[X∣Y])⊤R(Xāˆ’E[X∣Y])]+E[h⊤(Y)Rh(Y)]+2E[E[(Xāˆ’E[X∣Y])⊤∣Y]Rh(Y)]=E[(Xāˆ’E[X∣Y])⊤R(Xāˆ’E[X∣Y])]+E[h⊤(Y)Rh(Y)]≄E[(Xāˆ’E[X∣Y])⊤R(Xāˆ’E[X∣Y])]\begin{align*} &\mathbb{E}[(X-\mathbb{E}[X|Y]-h(Y))^{\top}R(X-\mathbb{E}[X|Y]-h(Y))]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]+2\mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}Rh(Y)]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]+2\mathbb{E}[\mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}Rh(Y)]\mid Y]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]+2\mathbb{E}\left[\mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}\mid Y]Rh(Y)\right]\\ &= \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])]+\mathbb{E}[h^{\top}(Y)Rh(Y)]\\ &\ge \mathbb{E}[(X-\mathbb{E}[X|Y])^{\top}R(X-\mathbb{E}[X|Y])] \end{align*} So in order we: 1. Multiply out 2. Apply Conditional Expectation 3. Apply Conditional Expectation 4. Make note that the new eq has Orthogonal rvs and thus 0 expectation 5. Achieve inequality. \end{proof}


Now we consider the following system: xt+1=Axt+But+wtwt∼N(0,W)yt=Cxt+vtvt∼N(0,V)\begin{align*} x_{t+1}&= Ax_{t}+Bu_{t}+w_{t}&w_{t}\sim \mathcal{N}(0,W)\\ y_{t}&= Cx_{t}+v_{t}&v_{t}\sim \mathcal{N}(0,V) \end{align*}where x∈Rn,u∈Rm,w∈Rn,y∈Rp,v∈Rpx\in \mathbb{R}^{n},u\in \mathbb{R}^{m},w\in \mathbb{R}^{n},y\in \mathbb{R}^{p},v\in \mathbb{R}^{p}. The goal is to find the optimal cost infā”Ī³āˆˆĪ“J(γ,μ0)\inf_{\gamma \in\Gamma}J(\gamma,\mu_{0})where the cost equation is the quadratic cost of the state and the action J(μ0,γ)=Eμ0γ[āˆ‘t=0Nāˆ’1xt⊤Qxt+ut⊤Rut+xN⊤QNxN]J(\mu_{0},\gamma)=\mathbb{E}_{\mu_{0}}^{\gamma}\left[ \sum_{t=0}^{N-1}x_{t}^{\top}Qx_{t}+u_{t}^{\top}Ru_{t}+x_{N}^{\top}Q_{N}x_{N} \right] with RR positive definite and Q,QNQ,Q_{N} positive semidefinite, and μ0\mu_{0} the prior on the state which is assumed to be zero-mean Gaussian.


Control-Free Setup

We consider the control-free setup here.

A Gaussian measure with mean μ\mu and covariance matrix Ī£XX\Sigma_{XX} has the following density p(x)=1(2Ļ€)n/2∣ΣXX∣1/2eāˆ’12((xāˆ’Ī¼)⊤ΣXXāˆ’1(xāˆ’Ī¼))p(x)= \frac{1}{(2\pi)^{n/2}|\Sigma_{XX}|^{1/2}}e^{- \frac{1}{2}((x-\mu)^{\top}\Sigma_{XX}^{-1}(x-\mu))}

Let X,YX,Y be zero-mean Gaussian vectors. Then, 1. E[X∣Y=y]\mathbb{E}[X|Y=y] is linear in y: E[X∣Y=y]=Ī£XYĪ£YYāˆ’1y\mathbb{E}[X|Y=y]=\Sigma_{XY}\Sigma_{YY}^{-1}yand; 2. We have that E[(Xāˆ’E[X∣Y])(Xāˆ’E[X∣Y])⊤]=Ī£XXāˆ’Ī£XYĪ£YYāˆ’1Ī£XY⊤=:D\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}]=\Sigma_{XX}-\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{XY}^{\top}=:D

In particular, E[(Xāˆ’E[X∣Y])(Xāˆ’E[X∣Y])⊤∣Y=y]\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}\mid Y=y] does not depend on the realization yy of YY and is equal to DD.

\begin{proof} (X,Y)(X,Y) are Gaussian processes and admit densities p(x∣y)=p(x,y)p(y)p(x\mid y)= \frac{p(x,y)}{p(y)}. With KXY:=E[[XY][X⊤Y⊤]]K_{XY}:=\mathbb{E}\left[\begin{bmatrix}X\\Y\end{bmatrix}\begin{bmatrix}X^{\top}&Y^{\top}\end{bmatrix}\right] we have that KXY:=[Ī£XXĪ£XYĪ£YXĪ£YY],KXYāˆ’1=[ĪØXXĪØXYĪØYXĪØYY]K_{XY}:=\begin{bmatrix}\Sigma_{XX} & \Sigma_{XY}\\\Sigma_{YX} & \Sigma_{YY}\end{bmatrix},\quad K_{XY}^{-1}=\begin{bmatrix}\Psi_{XX} & \Psi_{XY}\\\Psi_{YX} & \Psi_{YY}\end{bmatrix}Then, p(x,y)=1∣ΣXY∣(2Ļ€)n+m2eāˆ’12([XY]⊤KXYāˆ’1[XY])p(x,y)=\frac{1}{|\Sigma_{XY}|(2\pi)^{\frac{n+m}{2}}}e^{-\frac{1}{2}\left(\begin{bmatrix}X\\Y\end{bmatrix}^{\top}K_{XY}^{-1}\begin{bmatrix}X\\Y\end{bmatrix}\right)}then we have p(x∣y)=1(2Ļ€)n+m2∣ΣXY∣eāˆ’12([x⊤y⊤]KXYāˆ’1[xy])ā‹…(1(2Ļ€)m2∣KYY∣eāˆ’12y⊤KYYāˆ’1y)āˆ’1=Ceāˆ’12([x⊤y⊤][ĪØXXĪØXYĪØYXĪØYY][xy])eāˆ’12y⊤KYYāˆ’1y=Ceāˆ’12(x⊤ΨXXx+2x⊤ΨXYy+y⊤ΨYYy)eāˆ’12y⊤KYYāˆ’1y=Ceāˆ’12(x⊤ΨXXx+2x⊤ΨXYy+y⊤ΨYYyāˆ’y⊤KYYāˆ’1y)\begin{align*} p(x\mid y)&= \frac{1}{(2\pi)^{\frac{n+m}{2}}|\Sigma_{XY}|}e^{- \frac{1}{2}\left( \begin{bmatrix}x^{\top}&y^{\top}\end{bmatrix}K_{XY}^{-1}\begin{bmatrix}x\\y\end{bmatrix} \right) }\cdot \left( \frac{1}{(2\pi)^{\frac{m}{2}}|K_{YY}|}e^{-\frac{1}{2}y^{\top}K_{YY}^{-1}y} \right)^{-1} \\ &= C \frac{e^{-\frac{1}{2}\left( \begin{bmatrix}x^{\top}&y^{\top}\end{bmatrix}\begin{bmatrix}\Psi_{XX} & \Psi_{XY}\\\Psi_{YX} & \Psi_{YY}\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} \right) }}{e^{-\frac{1}{2}y^{\top}K_{YY}^{-1}y}}\\ &= C \frac{e^{-\frac{1}{2}\left( x^{\top}\Psi_{XX}x+2x^{\top}\Psi_{XY}y+y^{\top}\Psi_{YY}y \right) }}{e^{-\frac{1}{2}y^{\top}K_{YY}^{-1}y}}\\ &= C e^{-\frac{1}{2}\left( x^{\top}\Psi_{XX}x+2x^{\top}\Psi_{XY}y+y^{\top}\Psi_{YY}y -y^{\top}K_{YY}^{-1}y\right) } \end{align*}Now looking at the expression in the exponent we can apply a completion of squares argument: x⊤ΨXXx+2x⊤ΨXYy+y⊤ΨYYyāˆ’y⊤KYYāˆ’1y=(x+ĪØXXāˆ’1ĪØXYy)⊤ΨXX(x+ĪØXXāˆ’1ĪØXYy)+Q(y)=(xāˆ’Hy)⊤Dāˆ’1(xāˆ’Hy)+Q(y)\begin{align*} &x^{\top}\Psi_{XX}x+2x^{\top}\Psi_{XY}y+y^{\top}\Psi_{YY}y -y^{\top}K_{YY}^{-1}y\\ &= (x+\Psi_{XX}^{-1}\Psi_{XY}y)^{\top}\Psi_{XX}(x+\Psi_{XX}^{-1}\Psi_{XY}y)+Q(y)\\ &= (x-Hy)^{\top}D^{-1}(x-Hy)+Q(y) \end{align*}Then, we observe the following: [ĪØXXĪØXYĪØYXĪØYY]ā‹…[Ī£XXĪ£XYĪ£YXĪ£YY]=[I00I]\begin{bmatrix}\Psi_{XX} & \Psi_{XY}\\\Psi_{YX} & \Psi_{YY}\end{bmatrix}\cdot\begin{bmatrix}\Sigma_{XX} & \Sigma_{XY}\\\Sigma_{YX} & \Sigma_{YY}\end{bmatrix}=\begin{bmatrix}I&0\\0&I\end{bmatrix}which gives us that ĪØXXĪ£XY+ĪØXYĪ£YY=0\Psi_{XX}\Sigma_{XY}+\Psi_{XY}\Sigma_{YY}=0 therefore ā€…ā€ŠāŸ¹ā€…ā€ŠĪ£XY=āˆ’ĪØXXāˆ’1ĪØXYĪ£YYā€…ā€ŠāŸ¹ā€…ā€ŠĪ£XYĪ£YYāˆ’1=āˆ’ĪØXXāˆ’1ĪØXY\begin{align*} \implies\Sigma_{XY}=-\Psi_{XX}^{-1}\Psi_{XY}\Sigma_{YY}\\ \implies\Sigma_{XY}\Sigma_{YY}^{-1}=-\Psi_{XX}^{-1}\Psi_{XY} \end{align*}allowing us to re-express HH and leave us with the resultant conditional density p(x∣y)=Ceāˆ’12Q(y)eāˆ’12(xāˆ’Ī£XYĪ£YYāˆ’1y)⊤ΨXX(xāˆ’Ī£XYĪ£YYāˆ’1y)p(x\mid y)=Ce^{-\frac{1}{2}Q(y)}e^{-\frac{1}{2}(x-\Sigma_{XY}\Sigma_{YY}^{-1}y)^{\top}\Psi_{XX}(x-\Sigma_{XY}\Sigma_{YY}^{-1}y)}Giving us the first condition.

Finally, since ∫p(x∣y) dx=1\int\limits p(x\mid y) \, dx=1 we necessarily have that Ceāˆ’12Q(y)=1(2Ļ€)n2∣D∣12Ce^{-\frac{1}{2}Q(y)}=\frac{1}{(2\pi)^{\frac{n}{2}}|D|^{\frac{1}{2}}}which is in fact independent of y. Then, we finally have that DD which does not depend on yy is E[(Xāˆ’E[X∣Y])(Xāˆ’E[X∣Y])⊤∣Y=y]=E[(Xāˆ’E[X∣Y])(Xāˆ’E[X∣Y])⊤]\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}\mid Y=y]=\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}]

\end{proof}

To derive the Kalman Filter the following two lemmas are required: >[!lem|6.2.3] >If E[X]=0\mathbb{E}[X]=0 and Z1,Z2Z_{1},Z_{2} are orthogonal, zero-mean Gaussian Processes (with E[Z1⊤Z2]=0\mathbb{E}[Z^{\top}_{1}Z_{2}]=0), then E[X∣Z1=z1,Z2=z2]=E[X∣Z1=z1]+E[X∣Z2=z2]\mathbb{E}[X|Z_{1}=z_{1},Z_{2}=z_{2}]=\mathbb{E}[X|Z_{1}=z_{1}]+\mathbb{E}[X|Z_{2}=z_{2}]

\begin{proof} E[X∣(Z1,Z2)=(z1,z2)]=E[X[Z1Z2]⊤]E[[Z1Z2][Z1Z2]⊤]āˆ’1[z1z2]=E[XZ1⊤]E[XZ2⊤][E[Z1⊤Z1]E[Z1⊤Z2]E[Z2⊤Z1]E[Z2⊤Z2]]āˆ’1[z1z2]=E[XZ1⊤](E[Z1Z1⊤])āˆ’1z1+E[XZ2⊤](E[Z2Z2⊤])āˆ’1z2=E[X∣Z1=z1]+E[X∣Z2=z2]\begin{align*} &\mathbb{E}[X\mid(Z_{1},Z_{2})=(z_{1},z_{2})]\\ &= \mathbb{E}\left[X\begin{bmatrix}Z_{1}\\Z_{2}\end{bmatrix}^{\top }\right]\mathbb{E}\left[ \begin{bmatrix}Z_{1}\\Z_{2}\end{bmatrix}\begin{bmatrix}Z_{1}\\Z_{2} \end{bmatrix}^{\top} \right] ^{-1}\begin{bmatrix}z_{1}\\z_{2}\end{bmatrix}\\ &= \mathbb{E}[XZ_{1}^{\top}]\mathbb{E}[XZ_{2}^{\top}]\begin{bmatrix}\mathbb{E}[Z_{1}^{\top}Z_{1}] & \mathbb{E}[Z_{1}^{\top}Z_{2}]\\\mathbb{E}[Z_{2}^{\top}Z_{1}] & \mathbb{E}[Z_{2}^{\top}Z_{2}]\end{bmatrix}^{-1}\begin{bmatrix}z_{1}\\z_{2}\end{bmatrix}\\ &= \mathbb{E}[XZ_{1}^{\top}](\mathbb{E}[Z_{1}Z_{1}^{\top}])^{-1}z_{1}+\mathbb{E}[XZ_{2}^{\top}](\mathbb{E}[Z_{2}Z_{2}^{\top}])^{-1}z_{2}\\ &= \mathbb{E}[X\mid Z_{1}=z_{1}]+\mathbb{E}[X\mid Z_{2}=z_{2}] \end{align*} where the third equality is due to orthogonality and the final one is due to \end{proof}

and

E[(Xāˆ’E[X∣Y])(Xāˆ’E[X∣Y])⊤]\mathbb{E}[(X-\mathbb{E}[X|Y])(X-\mathbb{E}[X|Y])^{\top}] is given by DD from above

\begin{proof} First note that E[X(E[X∣Y])⊤]=E[(Xāˆ’E[X∣Y]+E[X∣Y])(E[X∣Y]⊤)]=E[E[X∣Y](E[X∣Y])⊤]\mathbb{E}[X(\mathbb{E}[X\mid Y])^{\top}]=\mathbb{E}[(X-\mathbb{E}[X\mid Y]+\mathbb{E}[X\mid Y])(\mathbb{E}[X\mid Y]^{\top})]=\mathbb{E}[\mathbb{E}[X\mid Y](\mathbb{E}[X\mid Y])^{\top}]since Xāˆ’E[X∣Y]X-\mathbb{E}[X\mid Y] is orthogonal to E[X∣Y]\mathbb{E}[X\mid Y] (which we know by another Conditional Expectation argument). Then E[(Xāˆ’E[X∣Y])(Xāˆ’E[X∣Y])⊤]=E[XX⊤]āˆ’2E[X(E[X∣Y])⊤]+E[E[X∣Y]E[X∣Y]⊤]=E[XX⊤]āˆ’E[E[X∣Y]E[X∣Y]⊤]=Ī£XXāˆ’E[Ī£XYĪ£YYāˆ’1yy⊤(Ī£YYāˆ’1)⊤ΣXY⊤]=Ī£XXāˆ’Ī£XYĪ£YYāˆ’1Ī£XYāˆ’1\begin{align*} \mathbb{E}[(X-\mathbb{E}[X\mid Y])(X-\mathbb{E}[X\mid Y])^{\top}]&= \mathbb{E}[XX^{\top}]-2\mathbb{E}[X(\mathbb{E}[X|Y])^{\top}]+\mathbb{E}[\mathbb{E}[X\mid Y]\mathbb{E}[X\mid Y]^{\top}]\\ &= \mathbb{E}[XX^{\top}]-\mathbb{E}[\mathbb{E}[X\mid Y]\mathbb{E}[X\mid Y]^{\top}]\\ &= \Sigma_{XX}-\mathbb{E}[\Sigma_{XY}\Sigma_{YY}^{-1}yy^{\top}(\Sigma_{YY}^{-1})^{\top}\Sigma_{XY}^{\top}]\\ &= \Sigma_{XX}-\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{XY}^{-1} \end{align*} \end{proof}

Now, consider the following system without control: xt+1=Axt+wtwt∼iidN(0,W)yt=Cxt+vtvt∼iidN(0,V)\begin{align*} x_{t+1}&= Ax_{t}+w_{t}&w_{t}\overset{iid}\sim \mathcal{N}(0,W)\\ y_{t}&= Cx_{t}+v_{t}&v_{t}\overset{iid}\sim \mathcal{N}(0,V) \end{align*}where x∈Rn,u∈Rm,w∈Rn,y∈Rp,v∈Rpx\in \mathbb{R}^{n},u\in \mathbb{R}^{m},w\in \mathbb{R}^{n},y\in \mathbb{R}^{p},v\in \mathbb{R}^{p}. Define the mean process, mtm_{t}, and covariance process, Ī£t∣tāˆ’1\Sigma_{t|t-1} as mt=E[xt∣y0:tāˆ’1]Ī£t∣tāˆ’1=E[(xtāˆ’E[xt∣y0:tāˆ’1])(xtāˆ’E[xt∣y0:tāˆ’1])⊤∣y0:tāˆ’1]\begin{align*} m_{t}&= \mathbb{E}[x_{t}\mid y_{0:t-1}]\\ \Sigma_{t|t-1}&= \mathbb{E}[(x_{t}-\mathbb{E}[x_{t}|y_{0:t-1}])(x_{t}-\mathbb{E}[x_{t}|y_{0:t-1}])^{\top}\mid y_{0:t-1}] \end{align*}and since the estimation error covariance does not depend on the realization y0:tāˆ’1y_{0:t-1} we can rewrite it as Ī£t∣tāˆ’1=E[(xtāˆ’E[xt∣y0:tāˆ’1])(xtāˆ’E[xt∣y0:tāˆ’1])⊤].\Sigma_{t|t-1}=\mathbb{E}[(x_{t}-\mathbb{E}[x_{t}\mid y_{0:t-1}])(x_{t}-\mathbb{E}[x_{t}\mid y_{0:t-1}])^{\top}]. >[!thm|6.2.1] >The following holds: mt+1=Amt+AĪ£t∣tāˆ’1C⊤(CĪ£t∣tāˆ’1C⊤+V)āˆ’1(ytāˆ’Cmt)Ī£t+1∣t=AĪ£t∣tāˆ’1A⊤+Wāˆ’(AĪ£t∣tāˆ’1C⊤)(CĪ£t∣tāˆ’1C⊤+V)āˆ’1(CĪ£t∣tāˆ’1A⊤)\begin{align*} m_{t+1}&= Am_{t}+A\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-Cm_{t})\\ \Sigma_{t+1|t}&= A\Sigma_{t|t-1}A^{\top}+W-(A\Sigma_{t|t-1}C^{\top})(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(C\Sigma_{t|t-1}A^{\top}) \end{align*}with m0=E[x0],Ī£0āˆ£āˆ’1=E[x0x0⊤]m_{0}=\mathbb{E}[x_{0}],\Sigma_{0|-1}=\mathbb{E}[x_{0}x_{0}^{\top}].

\begin{proof} mt+1=E[Axt+wt∣y0:t]=E[Axt∣y0:t]=E[Amt+A(xtāˆ’mt)∣y0:t]=Amt+E[A(xtāˆ’mt)∣y0:tāˆ’1,ytāˆ’E[yt∣y0:tāˆ’1]]=Amt+E[A(xtāˆ’mt)∣y0:tāˆ’1]+E[A(xtāˆ’mt)∣ytāˆ’E[yt∣y0:tāˆ’1]]byĀ lemmaĀ 6.2.3=Amt+E[A(xtāˆ’mt)∣ytāˆ’E[yt∣y0:tāˆ’1]]=Amt+E[A(xtāˆ’mt)∣Cxt+vtāˆ’E[Cxt+vt∣y0:tāˆ’1]]=Amt+E[A(xtāˆ’mt)∣C(xtāˆ’mt)+vt]\begin{align*} m_{t+1}&= \mathbb{E}[Ax_{t}+w_{t}\mid y_{0:t}]\\ &= \mathbb{E}[Ax_{t}\mid y_{0:t}]\\ &= \mathbb{E}[Am_{t}+A(x_{t}-m_{t})\mid y_{0:t}]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid y_{0:t-1},y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid y_{0:t-1}]\\ &\quad\quad\quad\quad+\mathbb{E}[A(x_{t}-m_{t})\mid y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]]&\text{by lemma 6.2.3}\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})|Cx_{t}+v_{t}-\mathbb{E}[Cx_{t}+v_{t}\mid y_{0:t-1}]]\\ &= Am_{t}+\mathbb{E}[A(x_{t}-m_{t})\mid C(x_{t}-m_{t})+v_{t}] \end{align*} Let X=A(xtāˆ’mt)X=A(x_{t}-m_{t}) and Y=E[yt∣y0:tāˆ’1]=ytāˆ’Cxt=C(xtāˆ’mt)+vtY=\mathbb{E}[y_{t}\mid y_{0:t-1}]=y_{t}-Cx_{t}=C(x_{t}-m_{t})+v_{t}. Then, by we have E[X∣Y]=Ī£XYĪ£YYāˆ’1Y\mathbb{E}[X\mid Y]=\Sigma_{XY}\Sigma_{YY}^{-1}Yand thus,mt+1=Amt+AE[(xtāˆ’mt)(xtāˆ’mt)⊤]C⊤(E[(C(xtāˆ’mt)+vt)(C(xtāˆ’mt)+vt)⊤])āˆ’1(ytāˆ’Cxt)=Amt+AĪ£t∣tāˆ’1C⊤(CE[(xtāˆ’mt)(xtāˆ’mt)⊤]C⊤+E[vtvt⊤])āˆ’1(ytāˆ’Cxt)=Amt+AĪ£t∣tāˆ’1C⊤(CĪ£t∣tāˆ’1C⊤+V)āˆ’1(ytāˆ’Cxt)\begin{align*} m_{t+1}&= Am_{t}\\ & +A\mathbb{E}[(x_{t}-m_{t})(x_{t}-m_{t})^{\top}]C^{\top}(\mathbb{E}[(C(x_{t}-m_{t})+v_{t})(C(x_{t}-m_{t})+v_{t})^{\top}])^{-1}(y_{t}-Cx_{t})\\\\ &= Am_{t}+A\Sigma_{t|t-1}C^{\top}(C\mathbb{E}[(x_{t}-m_{t})(x_{t}-m_{t})^{\top}]C^{\top}+\mathbb{E}[v_{t}v_{t}^{\top}])^{-1}(y_{t}-Cx_{t})\\ &= Am_{t}+A\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-Cx_{t}) \end{align*} Likewise, xt+1āˆ’mt+1=A(xtāˆ’mt)+wtāˆ’AĪ£t∣tāˆ’1C⊤(CĪ£t∣tāˆ’1C⊤+V)āˆ’1(ytāˆ’Cxt)=…Σt+1∣t=AĪ£t∣tāˆ’1A⊤+Wāˆ’(AĪ£t∣tāˆ’1C⊤)(CĪ£t∣tāˆ’1C⊤+V)āˆ’1(CĪ£t∣tāˆ’1A⊤)\begin{align*} x_{t+1}-m_{t+1}&= A(x_{t}-m_{t})+w_{t}-A\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-Cx_{t})\\ &= \dots\\ \Sigma_{t+1|t}&= A\Sigma_{t|t-1}A^{\top}+W-(A\Sigma_{t|t-1}C^{\top})(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(C\Sigma_{t|t-1}A^{\top}) \end{align*}

\end{proof} Define now m~t=E[xt∣y0:t]=mt+E[xtāˆ’mt∣y0:t].\tilde{m}_{t}=\mathbb{E}[x_{t}\mid y_{0:t}]=m_{t}+\mathbb{E}[x_{t}-m_{t}\mid y_{0:t}].Following the analysis above we obtain m~t=mt+E[xtāˆ’mt∣y0:tāˆ’1]+E[xtāˆ’mt∣ytāˆ’E[yt∣y0:tāˆ’1]]\tilde{m}_{t}=m_{t}+\mathbb{E}[x_{t}-m_{t}\mid y_{0:t-1}]+\mathbb{E}\left[x_{t}-m_{t}\mid y_{t}-\mathbb{E}[y_{t}\mid y_{0:t-1}]\right]Note that we also have mt=Am~tāˆ’1m_{t}=A\tilde{m}_{t-1}. The following then results:

The recursions for m~t\tilde{m}_{t} satisfy m~t=Am~tāˆ’1+Ī£t∣tāˆ’1C⊤(CĪ£t∣tāˆ’1C⊤+V)āˆ’1(ytāˆ’CAm~tāˆ’1)\tilde{m}_{t}=A\tilde{m}_{t-1}+\Sigma_{t|t-1}C^{\top}(C\Sigma_{t|t-1}C^{\top}+V)^{-1}(y_{t}-CA\tilde{m}_{t-1})with m~0=E[x0∣y0]\tilde{m}_{0}=\mathbb{E}[x_{0}\mid y_{0}].

Linked from