c:S×U→R is the cost function.
We can reduce this into a Belief MDP (P(S),U,η,c~) where Let us look at constructing the filter process: πt(st):=P(st∣y[0,t],u[0,t−1])=P(yt,ut−1∣y[0,t−1],u[0,t−2])P(st,yt,ut−1∣y[0,t−1],u[0,t−2])=S∫P(st,yt,ut−1∣y[0,t−1],u[0,t−2])dstP(st,yt,ut−1∣y[0,t−1],u[0,t−2])=S∫S∫P(st,st−1,yt,ut−1∣y[0,t−1],u[0,t−2])dstdst−1S∫P(st,st−1,yt,ut−1∣y[0,t−1],u[0,t−2])dst−1=S∫S∫P(yt∣st)P(st,st−1,ut−1∣y[0,t−1],u[0,t−2])dstdst−1S∫P(yt∣st)P(st,st−1,ut−1∣y[0,t−1],u[0,t−2])dst−1=S∫S∫P(yt∣st)P(st∣st−1,ut−1)P(st−1,ut−1∣y[0,t−1],u[0,t−2])dstdst−1S∫P(yt∣st)P(st∣st−1,ut−1)P(st−1,ut−1∣y[0,t−1],u[0,t−2])dst−1=S∫S∫P(yt∣st)P(st∣st−1,ut−1)P(ut−1∣y[0,t−1],u[0,t−2])πt−1(st−1)dstdst−1S∫P(yt∣st)P(st∣st−1,ut−1)P(ut−1∣y[0,t−1],u[0,t−2])πt−1(st−1)dst−1=S∫S∫P(yt∣st)P(st∣st−1,ut−1)πt−1(st−1)dstdst−1S∫P(yt∣st)P(st∣st−1,ut−1)πt−1(st−1)dst−1=X×M∫S∫P(yt∣st)P(xt∣xt−1,ut−1)P(mt∣mt−1)πt−1(st−1)dstdst−1X×M∫P(yt∣st)P(xt∣xt−1,ut−1)P(mt∣mt−1)πt−1(xt−1,mt−1)d(xt−1×mt−1)