FIND ME ON

GitHub

LinkedIn

Bayesian Statistics

🌱

Definition
StochasticProcesses

Definition (Statistical Model)

A statistical model: pYθ(Yθ)p_{Y|\theta}(Y|\theta)where YY is the observation (or data we have access to) and θ\theta is the parameter of interest (the thing we want to learn about that’s unknown). We are interested in finding an estimator θ^\hat\theta for θ\theta: θ^:\mboxRangeofY\mboxRangeofθ\hat\theta:\mbox{Range of }Y\to\mbox{Range of }\theta # Bayesian Statistics 1. Bayesians treat θ\theta as a random variable, and so the problem is to choose a prior distribution for θ\theta: pθ(θ)p_{\theta}(\theta) 2. The prior distribution pθ(θ)p_{\theta}(\theta) and the likelihood pYθ(yθ)p_{Y|\theta}(y|\theta) together determine the joint distribution of θ\theta and YY, i.e., pθ(θ)pYθ(yθ)=pθ,Y(θ,y)p_{\theta}(\theta)p_{Y|\theta}(y|\theta)=p_{\theta,Y}(\theta,y) 3. The inference becomes a task of computing the posterior distribution of parameter given the observed data (YY): p(θY)=pθ(θ)pYθ(yθ)zpθ(z)pYθ(yz)dzp(\theta|Y)=\frac{p_{\theta}(\theta)p_{Y|\theta}(y|\theta)}{\int_{z}p_{\theta}(z)p_{Y|\theta}(y|z)dz}(Note: the denominator doesn’t depend on θ\theta !) 4. Estimation: Now we can look to compute the posterior mean and a random draw from the posterior

Example

Given data Y=(Y1,,Yn)iidN(0,1)Y=(Y_{1},\ldots,Y_{n})\stackrel{iid}\sim\mathcal{N}(0,1) and the parameter of interest is θN(0,b2)\theta\sim\mathcal{N}(0,b^{2}). We can define some of the previous terms as: - Likelihood: pYθ(yθ)=i=1n12πe(yiθ)22p_{Y|\theta}(y|\theta)=\prod_{i=1}^{n} \frac{1}{\sqrt{2\pi}}e^{- \frac{(y_{i}-\theta)^{2}}{2}} - Prior: pθ(θ)=12πeθ22b2p_{\theta}(\theta)=\frac{1}{\sqrt{2\pi}}e^{- \frac{\theta^{2}}{2b^{2}}} Now we can compute the posterior: θY=yN(μ1,σ12)\theta|Y=y\sim\mathcal{N}(\mu_{1},\sigma_{1}^{2})where μ1=11+1nb2Yˉ,    σ12=1n+1b2\begin{align*} \mu_{1}=\frac{1}{1+ \frac{1}{nb^{2}}}\bar Y, \ \ \ \ \sigma_{1}^{2} = \frac{1}{n + \frac{1}{b^{2}}} \end{align*} Proof: pθY(θy)pyθ(yθ)pθ(θ)pY(y)\mbox(sinceYiid)eθ22b2i=1ne(yiθ)22exp{(12b2+n2)θ2+θi=1nyi}exp{(θμ)22σ12}    μ1=11+1nb2(1ni=1nYi),  σ12=1n+1b2\begin{align*} p_{\theta|Y}(\theta|y)&\propto \frac{p_{y|\theta}(y|\theta)p_{\theta}(\theta)}{p_{Y}(y)}\mbox{ (since Y iid)}\\ &\propto e^{- \frac{\theta^{2}}{2b^{2}}}\prod_{i=1}^{n}e^{- \frac{(y_{i}-\theta)^{2}}{2}}\\ &\propto \exp\left\{-\left(\frac{1}{2b^{2}} + \frac{n}{2}\right)\theta^{2}+\theta\sum\limits_{i=1}^{n}y_{i} \right\}\\ &\propto \exp\left\{- \frac{(\theta-\mu)^{2}}{2\sigma_{1}^{2}} \right\}\\ \implies\mu_{1}=\frac{1}{1 + \frac{1}{nb^{2}}}\left( \frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} \right), \ \ &\sigma_{1}^{2}=\frac{1}{n + \frac{1}{b^{2}}} \end{align*} Posterior mean: θ^=E[θY]=θpθY(θY)dθ=\mboxoneparameternormalmeancaseμ1=11+1nb2(1ni=1nYi)1ni=1nYi\hat\theta=E[\theta|Y]=\int\theta p_{\theta|Y}(\theta|Y)d\theta\stackrel{\mbox{one-parameter normal mean case}}=\mu_{1}=\frac{1}{1 + \frac{1}{nb^{2}}}\left( \frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} \right)\approx\frac{1}{n}\sum\limits_{i=1}^{n} Y_{i} Random draw from posterior: θ^N(μ,σ12)   σ12=1n+1b21n\hat\theta\sim\mathcal{N}(\mu,\sigma_{1}^{2}) \ \ \ \sigma_{1}^{2}=\frac{1}{n + \frac{1}{b^{2}}}\approx \frac{1}{n}

The Problem…

How do we do inference? Well, that’s hard, because the posterior is usually analytically intractable p(θy)=pθ(θ)p(yθ)Cp(\theta|y)=\frac{p_{\theta}(\theta)p(y|\theta)}{C}For a given θ\theta and yy, the numerator is easy to compute but the denominator is hard since it involves integrating, and if we have a complex distribution or a lot of data, then it takes forever. Task: given a probability π\pi, which is known up to a multiplicative constant CC, how do we draw a sample from π\pi or compute Eπ[X]E_{\pi}[X] for XπX\sim\pi? Well…. we use MCMC.

Linked from