FIND ME ON

GitHub

LinkedIn

Defining our thing as a CMC

🌱

So from my last meeting we set out to define our problem as follows: Let: - M\mathcal{M} be the set of maps (typically some Euclidean space or subset of Euclidean space). - X\mathbb{X} be the pose space - U\mathbb{U} be the action space - Y\mathbb{Y} be the observation space - T\mathcal{T} be our Stochastic Kernel - S:=X×M\mathbb{S}:=\mathbb{X}\times \mathcal{M} be the state space We define some process {(sk,uk)}k0\{ (s_{k},u_{k}) \}_{k\ge 0} (where sk=(xk,mk)s_{k}=(x_{k},\mathbf{m}_{k})) to have the following dynamics: {xk+1=f(xk,uk,wk)mk+1=mkyk=g(xk,mk,vk)\begin{align*} &\begin{cases} x_{k+1}=f(x_{k},u_{k},w_{k}) \\ \mathbf{m}_{k+1}=\mathbf{m}_{k} \end{cases}\\ &y_{k}=g(x_{k},m_{k},v_{k}) \end{align*}where xkXx_{k}\in\mathbb{X} represents the pose at time kk, mkM\mathbf{m}_{k}\in\mathcal{M} represents the map at time kk, and ykYy_{k}\in\mathbb{Y} represents the observation at time kk, and x0μ,m0νx_{0}\sim\mu,m_{0}\sim \nu (i.e. we start with some prior that describes our predicted initial pose and our first prediction of the composition of the map). The controller only has causal access to {yt}\{ y_{t} \}.

We know that given our state dynamics we can define the following: - Transition Kernel: P(xt+1xt=x,ut=u):=T(x,u)P(x_{t+1}\in \cdot\mid x_{t}=x,u_{t}=u):=\mathcal{T}(\cdot\mid x,u) - Observation Channel:P(ytxt=x):=Q(x)P(y_{t}\in\cdot\mid x_{t}=x):=Q(\cdot\mid x) - Map Identity: P(mk+1mk)=:δm()={1=m0mP(\mathbf{m}_{k+1}\in\cdot\mid \mathbf{m}_{k})=:\delta_{\mathbf{m}}(\cdot )=\begin{cases} 1 & \cdot=\mathbf{m} \\ 0 & \cdot\not=\mathbf{m} \end{cases} # Filter Process We then define the filter process πt\pi_{t} via the properties of total probability. For ASA\in\mathbb{S} where A=(x,m),xX,mMA=(x,\mathbf{m}),x \in\mathbb{X},\mathbf{m}\in\mathcal{M}:

πt(s):=P(st=sy[0,t],u[0,t1])=P(st=s,yt,ut1y[0,t1],u[0,t2])P(yt,ut1y[0,t1],u[0,t2])=P(st=s,yt,ut1y[0,t1],u[0,t2])SP(st=s,yt,ut1y[0,t1],u[0,t2])ds=SP(st=s,st1,yt,ut1y[0,t1],u[0,t2])dst1SSP(st=s,st1,yt,ut1y[0,t1],u[0,t2])dsdst1=SP(ytst=s)P(st=s,st1,ut1y[0,t1],u[0,t2])dst1SSP(ytst=s)P(st=s,st1,ut1y[0,t1],u[0,t2])dsdst1=SP(ytst=s)P(st=sst1,ut1)P(st1,ut1y[0,t1],u[0,t2])dst1SSP(ytst=s)P(st=sst1,ut1)P(st1,ut1y[0,t1],u[0,t2])dsdst1=SP(ytst=s)P(st=sst1,ut1)P(ut1y[0,t1],u[0,t2])πt1(st1)dst1SSP(ytst=s)P(st=sst1,ut1)P(ut1y[0,t1],u[0,t2])πt1(st1)dsdst1=SP(ytst=s)P(st=sst1,ut1)πt1(st1)dst1SSP(ytst=s)P(st=sst1,ut1)πt1(st1)dsdst1=X×MP(ytxt=x,mt=m)P(xt=xxt1,ut1)P(mt=mmt1)πt1(xt1,mt1)d(xt1×mt1)X×MX×MP(ytxt=x,mt=m)P(xt=xxt1,ut1)P(mt=mmt1)πt1(xt1,mt1)d(x×m)d(xt1×mt1)=X×MQ(ytx,m)T(xxt1,ut1)δm(m)πt1(xt1,mt1)d(xt1×mt1)X×MX×MQ(ytx,m)T(xxt1,ut1)δm(m)πt1(xt1,mt1)d(x×m)d(xt1×mt1)=:F(πt1,yt,ut1)(x,m)=Fπt1,yt,ut1(x,m)=Fπt1,yt,ut1(s)\begin{align*} \pi_{t}(s):&=P(s_{t}=s\mid y_{[0,t]}, u_{[0,t-1]})\\ &=\frac{P(s_{t}=s,y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})}{P(y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})}\\\\ &=\frac{P(s_{t}=s,y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})}{\int\limits _{\mathbb{S}}P(s_{t}=s,y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds }\\\\ &=\frac{\int\limits _{\mathbb{S}}P(s_{t}=s,s_{t-1},y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(s_{t}=s,s_{t-1},y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds \, ds_{t-1} }\\\\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s,s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s,s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \,ds_{t-1}}{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})\pi_{t-1}(s_{t-1}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})\pi_{t-1}(s_{t-1}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})\pi_{t-1}(s_{t-1}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})\pi_{t-1}(s_{t-1}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{X}\times \mathbb{M}} P(y_{t}\mid x_{t}=x,m_{t}=m)P(x_{t}=x\mid x_{t-1},u_{t-1})P(m_{t}=m\mid m_{t-1})\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x_{t-1}\times m_{t-1}) }{\int\limits _{\mathbb{X}\times \mathbb{M}}\int\limits _{\mathbb{X}\times \mathbb{M}} P(y_{t}\mid x_{t}=x,m_{t}=m)P(x_{t}=x\mid x_{t-1},u_{t-1})P(m_{t}=m\mid m_{t-1})\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x\times m) \, d(x_{t-1}\times m_{t-1}) }\\ \\ &=\frac{\int\limits _{\mathbb{X}\times \mathbb{M}} Q(y_{t}\mid x,m)\mathcal{T}(x\mid x_{t-1},u_{t-1})\delta_{\mathbf{m}}(m)\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x_{t-1}\times m_{t-1}) }{\int\limits _{\mathbb{X}\times \mathbb{M}}\int\limits _{\mathbb{X}\times \mathbb{M}} Q(y_{t}\mid x,m)\mathcal{T}(x\mid x_{t-1},u_{t-1})\delta_{\mathbf{m}}(m)\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x\times m) \, d(x_{t-1}\times m_{t-1}) }\\ &=:F(\pi_{t-1},y_{t},u_{t-1})(x,m) = F_{\pi_{t-1},y_{t},u_{t-1}}(x,m) =F_{\pi_{t-1},y_{t},u_{t-1}}(s) \end{align*}

Controlled Markov Chain

Let DB(P(S))D\in\mathcal{B}(\mathcal{P}(\mathbb{S})). We denote Yt+1Y_{t+1} with a capital to emphasize its randomness: P(πt+1Dπ[0,t],u[0,t])=P(Fπt,Yt+1,utDπ[0,t],u[0,t])=yYP(Fπt,yt+1,utD,yt+1=yπ[0,t],u[0,t])=yYP(Fπt,yt+1,utDyt+1=y,π[0,t],u[0,t])P(yt+1=yπ[0,t],u[0,t])=yY1{Fπt,y,utD}P(yt+1=yπt,ut)=yY1{Fπt,y,utD}(xXxXP(yt+1=y,xt+1=x,xt=xπt,ut))=yY1{Fπt,y,utD}(xXxXP(yt+1=yxt+1=x)P(xt+1=x,xt=xπt,ut))=yY1{Fπt,y,utD}(xXxXP(yt+1=yxt+1=x)P(xt+1=xxt=x,ut)P(xt=xπt,ut))=yY1{Fπt,y,utD}(xXxXP(yt+1=yxt+1=x)P(xt+1=xxt=x,ut)πt(x))=P(πt+1Dπt,ut)\begin{align*} &P(\pi_{t+1}\in D\mid \pi_{[0,t]},u_{[0,t]})=P(F_{\pi_{t},Y_{t+1},u_{t}}\in D\mid \pi_{[0,t]},u_{[0,t]})\\ &= \sum_{y\in\mathbb{Y}}P(F_{\pi_{t},y_{t+1},u_{t}}\in D,y_{t+1}=y\mid \pi_{[0,t]},u_{[0,t]})\\ &= \sum_{y\in\mathbb{Y}}P(F_{\pi_{t},y_{t+1},u_{t}}\in D\mid y_{t+1}=y, \pi_{[0,t]},u_{[0,t]})P(y_{t+1}=y\mid \pi_{[0,t]},u_{[0,t]})\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}}P(y_{t+1}=y\mid \pi_{t},u_{t})\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y,x_{t+1}=x',x_{t}=x\mid \pi_{t},u_{t}) \right)\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y\mid x_{t+1}=x')P(x_{t+1}=x',x_{t}=x\mid \pi_{t},u_{t}) \right)\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y\mid x_{t+1}=x')P(x_{t+1}=x'\mid x_{t}=x, u_{t})P(x_{t}=x\mid \pi_{t},u_{t}) \right)\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y\mid x_{t+1}=x')P(x_{t+1}=x'\mid x_{t}=x, u_{t})\pi_{t}(x) \right)\\ &= P(\pi_{t+1}\in D\mid \pi_{t},u_{t}) \end{align*}

Policy

Let γˉ=(γc,γe)\bar{\gamma}=(\gamma^{c},\gamma^{e}) be our Policy. The goal is to the minimize some combined expected cost functional Jβ(x0,m0,γˉ)=Eγˉ[k=0N1βkc(xk,mk,uk)+λρ(mk,m^k)]J_{\beta}(x_{0},m_{0},\bar{\gamma})=E_{}^{\bar{\gamma}}\left[ \sum^{N-1}_{k=0}\beta^{k}c(x_{k},m_{k},u_{k})+\lambda\rho(m_{k},\hat{m}_{k}) \right]where γc\gamma^{c} represents the controller policy and γe\gamma^{e} represents the estimator policy. We would like to simultaneously optimize for control effort (i.e. find some optimal γc\gamma^{c*}) and optimize our estimation error of the map (i.e. find some optimal γe\gamma^{e*}).