Bayesian Statistics

🌱

Definition

StochasticProcesses

Definition (Statistical Model)

A statistical model: $p_{Y|\theta}(Y|\theta)$ where $Y$ is the observation (or data we have access to) and $\theta$ is the parameter of interest (the thing we want to learn about that’s unknown). We are interested in finding an estimator $\hat\theta$ for $\theta$ : $\hat\theta:\mbox{Range of }Y\to\mbox{Range of }\theta$ # Bayesian Statistics 1. Bayesians treat $\theta$ as a random variable, and so the problem is to choose a prior distribution for $\theta$ : $p_{\theta}(\theta)$ 2. The prior distribution $p_{\theta}(\theta)$ and the likelihood $p_{Y|\theta}(y|\theta)$ together determine the joint distribution of $\theta$ and $Y$ , i.e., $p_{\theta}(\theta)p_{Y|\theta}(y|\theta)=p_{\theta,Y}(\theta,y)$ 3. The inference becomes a task of computing the posterior distribution of parameter given the observed data ( $Y$ ): $p(\theta|Y)=\frac{p_{\theta}(\theta)p_{Y|\theta}(y|\theta)}{\int_{z}p_{\theta}(z)p_{Y|\theta}(y|z)dz}$ (Note: the denominator doesn’t depend on $\theta$ !) 4. Estimation: Now we can look to compute the posterior mean and a random draw from the posterior

Example

Given data $Y=(Y_{1},\ldots,Y_{n})\stackrel{iid}\sim\mathcal{N}(0,1)$ and the parameter of interest is $\theta\sim\mathcal{N}(0,b^{2})$ . We can define some of the previous terms as: - Likelihood: $p_{Y|\theta}(y|\theta)=\prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}}e^{- \frac{(y_{i}-\theta)^{2}}{2}}$ - Prior: $p_{\theta}(\theta)=\frac{1}{\sqrt{2\pi}}e^{- \frac{\theta^{2}}{2b^{2}}}$ Now we can compute the posterior: $\theta|Y=y\sim\mathcal{N}(\mu_{1},\sigma_{1}^{2})$ where $\begin{align*} \mu_{1}=\frac{1}{1+ \frac{1}{nb^{2}}}\bar Y, \ \ \ \ \sigma_{1}^{2} = \frac{1}{n + \frac{1}{b^{2}}} \end{align*}$ Proof: $\begin{align*} p_{\theta|Y}(\theta|y)&\propto \frac{p_{y|\theta}(y|\theta)p_{\theta}(\theta)}{p_{Y}(y)}\mbox{ (since Y iid)}\\ &\propto e^{- \frac{\theta^{2}}{2b^{2}}}\prod_{i=1}^{n}e^{- \frac{(y_{i}-\theta)^{2}}{2}}\\ &\propto \exp\left\{-\left(\frac{1}{2b^{2}} + \frac{n}{2}\right)\theta^{2}+\theta\sum\limits_{i=1}^{n}y_{i} \right\}\\ &\propto \exp\left\{- \frac{(\theta-\mu)^{2}}{2\sigma_{1}^{2}} \right\}\\ \implies\mu_{1}=\frac{1}{1 + \frac{1}{nb^{2}}}\left( \frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} \right), \ \ &\sigma_{1}^{2}=\frac{1}{n + \frac{1}{b^{2}}} \end{align*}$ Posterior mean: $\hat\theta=E[\theta|Y]=\int\theta p_{\theta|Y}(\theta|Y)d\theta\stackrel{\mbox{one-parameter normal mean case}}=\mu_{1}=\frac{1}{1 + \frac{1}{nb^{2}}}\left( \frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} \right)\approx\frac{1}{n}\sum\limits_{i=1}^{n} Y_{i}$ Random draw from posterior: $\hat\theta\sim\mathcal{N}(\mu,\sigma_{1}^{2}) \ \ \ \sigma_{1}^{2}=\frac{1}{n + \frac{1}{b^{2}}}\approx \frac{1}{n}$

The Problem…

How do we do inference? Well, that’s hard, because the posterior is usually analytically intractable $p(\theta|y)=\frac{p_{\theta}(\theta)p(y|\theta)}{C}$ For a given $\theta$ and $y$ , the numerator is easy to compute but the denominator is hard since it involves integrating, and if we have a complex distribution or a lot of data, then it takes forever. Task: given a probability $\pi$ , which is known up to a multiplicative constant $C$ , how do we draw a sample from $\pi$ or compute $E_{\pi}[X]$ for $X\sim\pi$ ? Well…. we use MCMC.

Linked from

MCMC