FIND ME ON

GitHub

LinkedIn

Reinforcement Learning for Zero-Delay Coding

🌱

Let {Xt}t0\{ X_{t} \}_{t\ge 0} be a Stationary Markov chain taking values in a finite set X\mathbb{X}. We define its Transition Kernel as P(xt+1xt)P(x_{t+1}\mid x_{t}) We also assume access to its prior π0\pi_{0}.

The channel is a discrete memoryless channel with input and output alphabets M\mathcal{M} and M\mathcal{M}' respectively and transition matrix given by T(qtqt)qtM,qtMT(q_{t'}\mid q_{t})\quad q_{t}\in\mathcal{M},q_{t'}\in\mathcal{M}'Finally we denote the reconstruction sequence as {X^t}t0\{ \hat{X}_{t} \}_{t\ge 0} taking values in a finite set X^\hat{\mathbb{X}}.

We denote the encoder policy by the sequence γe={γte}t0\gamma^{e}=\{ \gamma_{t}^{e} \}_{t\ge 0} and decoder policy γd={γtd}t0\gamma^{d}=\{ \gamma_{t}^{d} \}_{t\ge 0}.

At time tt we let the encoder have access to all past channel inputs, outputs and source symbols (past and present) i.e. γe:Mt×(M)t×Xt+1Mγ(q[0,t1],q[0,t1],X[0,t])qt\begin{align*} \gamma^{e}:\mathcal{M}^{t}\times(\mathcal{M}')^{t}\times \mathbb{X}^{t+1}&\to \mathcal{M}\\ \gamma(q_{[0,t-1]},q'_{[0,t-1]},X_{[0,t]})&\mapsto q_{t} \end{align*}similarly we allow the decoder to have access to all past and present channel outputs in order to generate the reconstruction symbol so that γd:(M)t+1X^γtd(q[0,t])X^t\begin{align*} \gamma^{d}:(\mathcal{M}')^{t+1}&\to \hat{\mathbb{X}}\\ \gamma_{t}^{d}(q'_{[0,t]})&\mapsto\hat{X}_{t} \end{align*} The goal here is to minimize the average distortion. In the infinite-horizon case, this is given by J(π0,γ):=lim supTEπ0γed[1Tt=0T1d(Xt,X^t)]J(\pi_{0},\gamma):=\limsup_{ T \to \infty }E_{\pi_{0}}^{\gamma^{ed}}\left[ \frac{1}{T}\sum_{t=0}^{T-1}d(X_{t},\hat{X}_{t})\right] where d:X×^X[0,]d:\mathbb{X}\hat{\times}\mathbb{X}\to[0,\infty] is a distortion measure.

Linked from