Defining our thing as a CMC

🌱

So from my last meeting we set out to define our problem as follows: Let: - $\mathcal{M}$ be the set of maps (typically some Euclidean space or subset of Euclidean space). - $\mathbb{X}$ be the pose space - $\mathbb{U}$ be the action space - $\mathbb{Y}$ be the observation space - $\mathcal{T}$ be our Stochastic Kernel - $\mathbb{S}:=\mathbb{X}\times \mathcal{M}$ be the state space We define some process $\{ (s_{k},u_{k}) \}_{k\ge 0}$ (where $s_{k}=(x_{k},\mathbf{m}_{k})$ ) to have the following dynamics: $\begin{align*} &\begin{cases} x_{k+1}=f(x_{k},u_{k},w_{k}) \\ \mathbf{m}_{k+1}=\mathbf{m}_{k} \end{cases}\\ &y_{k}=g(x_{k},m_{k},v_{k}) \end{align*}$ where $x_{k}\in\mathbb{X}$ represents the pose at time $k$ , $\mathbf{m}_{k}\in\mathcal{M}$ represents the map at time $k$ , and $y_{k}\in\mathbb{Y}$ represents the observation at time $k$ , and $x_{0}\sim\mu,m_{0}\sim \nu$ (i.e. we start with some prior that describes our predicted initial pose and our first prediction of the composition of the map). The controller only has causal access to $\{ y_{t} \}$ .

We know that given our state dynamics we can define the following: - Transition Kernel: $P(x_{t+1}\in \cdot\mid x_{t}=x,u_{t}=u):=\mathcal{T}(\cdot\mid x,u)$ - Observation Channel: $P(y_{t}\in\cdot\mid x_{t}=x):=Q(\cdot\mid x)$ - Map Identity: $P(\mathbf{m}_{k+1}\in\cdot\mid \mathbf{m}_{k})=:\delta_{\mathbf{m}}(\cdot )=\begin{cases} 1 & \cdot=\mathbf{m} \\ 0 & \cdot\not=\mathbf{m} \end{cases}$ # Filter Process We then define the filter process $\pi_{t}$ via the properties of total probability. For $A\in\mathbb{S}$ where $A=(x,\mathbf{m}),x \in\mathbb{X},\mathbf{m}\in\mathcal{M}$ :

$\begin{align*} \pi_{t}(s):&=P(s_{t}=s\mid y_{[0,t]}, u_{[0,t-1]})\\ &=\frac{P(s_{t}=s,y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})}{P(y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})}\\\\ &=\frac{P(s_{t}=s,y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})}{\int\limits _{\mathbb{S}}P(s_{t}=s,y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds }\\\\ &=\frac{\int\limits _{\mathbb{S}}P(s_{t}=s,s_{t-1},y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(s_{t}=s,s_{t-1},y_{t},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds \, ds_{t-1} }\\\\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s,s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s,s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \,ds_{t-1}}{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(s_{t-1},u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})\pi_{t-1}(s_{t-1}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})P(u_{t-1}\mid y_{[0,t-1]}, u_{[0,t-2]})\pi_{t-1}(s_{t-1}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{S}} P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})\pi_{t-1}(s_{t-1}) \, ds_{t-1} }{\int\limits _{\mathbb{S}}\int\limits _{\mathbb{S}}P(y_{t}\mid s_{t}=s)P(s_{t}=s\mid s_{t-1},u_{t-1})\pi_{t-1}(s_{t-1}) \, ds \, ds_{t-1} }\\ \\ &=\frac{\int\limits _{\mathbb{X}\times \mathbb{M}} P(y_{t}\mid x_{t}=x,m_{t}=m)P(x_{t}=x\mid x_{t-1},u_{t-1})P(m_{t}=m\mid m_{t-1})\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x_{t-1}\times m_{t-1}) }{\int\limits _{\mathbb{X}\times \mathbb{M}}\int\limits _{\mathbb{X}\times \mathbb{M}} P(y_{t}\mid x_{t}=x,m_{t}=m)P(x_{t}=x\mid x_{t-1},u_{t-1})P(m_{t}=m\mid m_{t-1})\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x\times m) \, d(x_{t-1}\times m_{t-1}) }\\ \\ &=\frac{\int\limits _{\mathbb{X}\times \mathbb{M}} Q(y_{t}\mid x,m)\mathcal{T}(x\mid x_{t-1},u_{t-1})\delta_{\mathbf{m}}(m)\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x_{t-1}\times m_{t-1}) }{\int\limits _{\mathbb{X}\times \mathbb{M}}\int\limits _{\mathbb{X}\times \mathbb{M}} Q(y_{t}\mid x,m)\mathcal{T}(x\mid x_{t-1},u_{t-1})\delta_{\mathbf{m}}(m)\pi_{t-1}(x_{t-1},m_{t-1}) \, d(x\times m) \, d(x_{t-1}\times m_{t-1}) }\\ &=:F(\pi_{t-1},y_{t},u_{t-1})(x,m) = F_{\pi_{t-1},y_{t},u_{t-1}}(x,m) =F_{\pi_{t-1},y_{t},u_{t-1}}(s) \end{align*}$

Controlled Markov Chain

Let $D\in\mathcal{B}(\mathcal{P}(\mathbb{S}))$ . We denote $Y_{t+1}$ with a capital to emphasize its randomness: $\begin{align*} &P(\pi_{t+1}\in D\mid \pi_{[0,t]},u_{[0,t]})=P(F_{\pi_{t},Y_{t+1},u_{t}}\in D\mid \pi_{[0,t]},u_{[0,t]})\\ &= \sum_{y\in\mathbb{Y}}P(F_{\pi_{t},y_{t+1},u_{t}}\in D,y_{t+1}=y\mid \pi_{[0,t]},u_{[0,t]})\\ &= \sum_{y\in\mathbb{Y}}P(F_{\pi_{t},y_{t+1},u_{t}}\in D\mid y_{t+1}=y, \pi_{[0,t]},u_{[0,t]})P(y_{t+1}=y\mid \pi_{[0,t]},u_{[0,t]})\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}}P(y_{t+1}=y\mid \pi_{t},u_{t})\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y,x_{t+1}=x',x_{t}=x\mid \pi_{t},u_{t}) \right)\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y\mid x_{t+1}=x')P(x_{t+1}=x',x_{t}=x\mid \pi_{t},u_{t}) \right)\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y\mid x_{t+1}=x')P(x_{t+1}=x'\mid x_{t}=x, u_{t})P(x_{t}=x\mid \pi_{t},u_{t}) \right)\\ &= \sum_{y\in\mathbb{Y}} \mathbb{1}_{\{ F_{\pi_{t},y,u_{t}}\in D \}} \left( \sum_{x'\in\mathbb{X}}\sum_{x \in\mathbb{X}} P(y_{t+1}=y\mid x_{t+1}=x')P(x_{t+1}=x'\mid x_{t}=x, u_{t})\pi_{t}(x) \right)\\ &= P(\pi_{t+1}\in D\mid \pi_{t},u_{t}) \end{align*}$

Policy

Let $\bar{\gamma}=(\gamma^{c},\gamma^{e})$ be our Policy. The goal is to the minimize some combined expected cost functional $J_{\beta}(x_{0},m_{0},\bar{\gamma})=E_{}^{\bar{\gamma}}\left[ \sum^{N-1}_{k=0}\beta^{k}c(x_{k},m_{k},u_{k})+\lambda\rho(m_{k},\hat{m}_{k}) \right]$ where $\gamma^{c}$ represents the controller policy and $\gamma^{e}$ represents the estimator policy. We would like to simultaneously optimize for control effort (i.e. find some optimal $\gamma^{c*}$ ) and optimize our estimation error of the map (i.e. find some optimal $\gamma^{e*}$ ).