So from my last meeting we set out to define our problem as follows: Let: - M be the set of maps (typically some Euclidean space or subset of Euclidean space). - X be the pose space - U be the action space - Y be the observation space - T be our Stochastic Kernel - S:=X×M be the state space We define some process {(sk,uk)}k≥0 (where sk=(xk,mk)) to have the following dynamics: {xk+1=f(xk,uk,wk)mk+1=mkyk=g(xk,mk,vk)where xk∈X represents the pose at time k, mk∈M represents the map at time k, and yk∈Y represents the observation at time k, and x0∼μ,m0∼ν (i.e. we start with some prior that describes our predicted initial pose and our first prediction of the composition of the map). The controller only has causal access to {yt}.
We know that given our state dynamics we can define the following: - Transition Kernel: P(xt+1∈⋅∣xt=x,ut=u):=T(⋅∣x,u) - Observation Channel:P(yt∈⋅∣xt=x):=Q(⋅∣x) - Map Identity: P(mk+1∈⋅∣mk)=:δm(⋅)={10⋅=m⋅=m # Filter Process We then define the filter process πt via the properties of total probability. For A∈S where A=(x,m),x∈X,m∈M:
πt(s):=P(st=s∣y[0,t],u[0,t−1])=P(yt,ut−1∣y[0,t−1],u[0,t−2])P(st=s,yt,ut−1∣y[0,t−1],u[0,t−2])=S∫P(st=s,yt,ut−1∣y[0,t−1],u[0,t−2])dsP(st=s,yt,ut−1∣y[0,t−1],u[0,t−2])=S∫S∫P(st=s,st−1,yt,ut−1∣y[0,t−1],u[0,t−2])dsdst−1S∫P(st=s,st−1,yt,ut−1∣y[0,t−1],u[0,t−2])dst−1=S∫S∫P(yt∣st=s)P(st=s,st−1,ut−1∣y[0,t−1],u[0,t−2])dsdst−1S∫P(yt∣st=s)P(st=s,st−1,ut−1∣y[0,t−1],u[0,t−2])dst−1=S∫S∫P(yt∣st=s)P(st=s∣st−1,ut−1)P(st−1,ut−1∣y[0,t−1],u[0,t−2])dsdst−1S∫P(yt∣st=s)P(st=s∣st−1,ut−1)P(st−1,ut−1∣y[0,t−1],u[0,t−2])dst−1=S∫S∫P(yt∣st=s)P(st=s∣st−1,ut−1)P(ut−1∣y[0,t−1],u[0,t−2])πt−1(st−1)dsdst−1S∫P(yt∣st=s)P(st=s∣st−1,ut−1)P(ut−1∣y[0,t−1],u[0,t−2])πt−1(st−1)dst−1=S∫S∫P(yt∣st=s)P(st=s∣st−1,ut−1)πt−1(st−1)dsdst−1S∫P(yt∣st=s)P(st=s∣st−1,ut−1)πt−1(st−1)dst−1=X×M∫X×M∫P(yt∣xt=x,mt=m)P(xt=x∣xt−1,ut−1)P(mt=m∣mt−1)πt−1(xt−1,mt−1)d(x×m)d(xt−1×mt−1)X×M∫P(yt∣xt=x,mt=m)P(xt=x∣xt−1,ut−1)P(mt=m∣mt−1)πt−1(xt−1,mt−1)d(xt−1×mt−1)=X×M∫X×M∫Q(yt∣x,m)T(x∣xt−1,ut−1)δm(m)πt−1(xt−1,mt−1)d(x×m)d(xt−1×mt−1)X×M∫Q(yt∣x,m)T(x∣xt−1,ut−1)δm(m)πt−1(xt−1,mt−1)d(xt−1×mt−1)=:F(πt−1,yt,ut−1)(x,m)=Fπt−1,yt,ut−1(x,m)=Fπt−1,yt,ut−1(s)
Controlled Markov Chain
Let D∈B(P(S)). We denote Yt+1 with a capital to emphasize its randomness: P(πt+1∈D∣π[0,t],u[0,t])=P(Fπt,Yt+1,ut∈D∣π[0,t],u[0,t])=y∈Y∑P(Fπt,yt+1,ut∈D,yt+1=y∣π[0,t],u[0,t])=y∈Y∑P(Fπt,yt+1,ut∈D∣yt+1=y,π[0,t],u[0,t])P(yt+1=y∣π[0,t],u[0,t])=y∈Y∑1{Fπt,y,ut∈D}P(yt+1=y∣πt,ut)=y∈Y∑1{Fπt,y,ut∈D}(x′∈X∑x∈X∑P(yt+1=y,xt+1=x′,xt=x∣πt,ut))=y∈Y∑1{Fπt,y,ut∈D}(x′∈X∑x∈X∑P(yt+1=y∣xt+1=x′)P(xt+1=x′,xt=x∣πt,ut))=y∈Y∑1{Fπt,y,ut∈D}(x′∈X∑x∈X∑P(yt+1=y∣xt+1=x′)P(xt+1=x′∣xt=x,ut)P(xt=x∣πt,ut))=y∈Y∑1{Fπt,y,ut∈D}(x′∈X∑x∈X∑P(yt+1=y∣xt+1=x′)P(xt+1=x′∣xt=x,ut)πt(x))=P(πt+1∈D∣πt,ut)
Policy
Let γˉ=(γc,γe) be our Policy. The goal is to the minimize some combined expected cost functional Jβ(x0,m0,γˉ)=Eγˉ[k=0∑N−1βkc(xk,mk,uk)+λρ(mk,m^k)]where γc represents the controller policy and γe represents the estimator policy. We would like to simultaneously optimize for control effort (i.e. find some optimal γc∗) and optimize our estimation error of the map (i.e. find some optimal γe∗).