Reinforcement Learning for Zero-Delay Coding

NAVIGATION

Home

Research

Bookshelf

Garden

FIND ME ON

GitHub

Home

Research

Bookshelf

Garden

Reinforcement Learning for Zero-Delay Coding

🌱

Let $\{ X_{t} \}_{t\ge 0}$ be a Stationary Markov chain taking values in a finite set $\mathbb{X}$ . We define its Transition Kernel as $P(x_{t+1}\mid x_{t})$ We also assume access to its prior $\pi_{0}$ .

The channel is a discrete memoryless channel with input and output alphabets $\mathcal{M}$ and $\mathcal{M}'$ respectively and transition matrix given by $T(q_{t'}\mid q_{t})\quad q_{t}\in\mathcal{M},q_{t'}\in\mathcal{M}'$ Finally we denote the reconstruction sequence as $\{ \hat{X}_{t} \}_{t\ge 0}$ taking values in a finite set $\hat{\mathbb{X}}$ .

We denote the encoder policy by the sequence $\gamma^{e}=\{ \gamma_{t}^{e} \}_{t\ge 0}$ and decoder policy $\gamma^{d}=\{ \gamma_{t}^{d} \}_{t\ge 0}$ .

At time $t$ we let the encoder have access to all past channel inputs, outputs and source symbols (past and present) i.e. $\begin{align*} \gamma^{e}:\mathcal{M}^{t}\times(\mathcal{M}')^{t}\times \mathbb{X}^{t+1}&\to \mathcal{M}\\ \gamma(q_{[0,t-1]},q'_{[0,t-1]},X_{[0,t]})&\mapsto q_{t} \end{align*}$ similarly we allow the decoder to have access to all past and present channel outputs in order to generate the reconstruction symbol so that $\begin{align*} \gamma^{d}:(\mathcal{M}')^{t+1}&\to \hat{\mathbb{X}}\\ \gamma_{t}^{d}(q'_{[0,t]})&\mapsto\hat{X}_{t} \end{align*}$ The goal here is to minimize the average distortion. In the infinite-horizon case, this is given by $J(\pi_{0},\gamma):=\limsup_{ T \to \infty }E_{\pi_{0}}^{\gamma^{ed}}\left[ \frac{1}{T}\sum_{t=0}^{T-1}d(X_{t},\hat{X}_{t})\right]$ where $d:\mathbb{X}\hat{\times}\mathbb{X}\to[0,\infty]$ is a distortion measure.

Linked from

Anatomy of a problem