A statistical model: pY∣θ(Y∣θ)where Y is the observation (or data we have access to) and θ is the parameter of interest (the thing we want to learn about that’s unknown). We are interested in finding an estimatorθ^ for θ: θ^:\mboxRangeofY→\mboxRangeofθ # Bayesian Statistics 1. Bayesians treat θ as a random variable, and so the problem is to choose a prior distribution for θ: pθ(θ) 2. The prior distribution pθ(θ) and the likelihoodpY∣θ(y∣θ) together determine the joint distribution of θ and Y, i.e., pθ(θ)pY∣θ(y∣θ)=pθ,Y(θ,y) 3. The inference becomes a task of computing the posterior distribution of parameter given the observed data (Y): p(θ∣Y)=∫zpθ(z)pY∣θ(y∣z)dzpθ(θ)pY∣θ(y∣θ)(Note: the denominator doesn’t depend on θ !) 4. Estimation: Now we can look to compute the posterior mean and a random draw from the posterior
Example
Given data Y=(Y1,…,Yn)∼iidN(0,1) and the parameter of interest is θ∼N(0,b2). We can define some of the previous terms as: - Likelihood: pY∣θ(y∣θ)=i=1∏n2π1e−2(yi−θ)2 - Prior: pθ(θ)=2π1e−2b2θ2 Now we can compute the posterior: θ∣Y=y∼N(μ1,σ12)where μ1=1+nb211Yˉ,σ12=n+b211 Proof: pθ∣Y(θ∣y)⟹μ1=1+nb211(n1i=1∑nYi),∝pY(y)py∣θ(y∣θ)pθ(θ)\mbox(sinceYiid)∝e−2b2θ2i=1∏ne−2(yi−θ)2∝exp{−(2b21+2n)θ2+θi=1∑nyi}∝exp{−2σ12(θ−μ)2}σ12=n+b211 Posterior mean: θ^=E[θ∣Y]=∫θpθ∣Y(θ∣Y)dθ=\mboxone−parameternormalmeancaseμ1=1+nb211(n1i=1∑nYi)≈n1i=1∑nYi Random draw from posterior: θ^∼N(μ,σ12)σ12=n+b211≈n1
The Problem…
How do we do inference? Well, that’s hard, because the posterior is usually analytically intractable p(θ∣y)=Cpθ(θ)p(y∣θ)For a given θ and y, the numerator is easy to compute but the denominator is hard since it involves integrating, and if we have a complex distribution or a lot of data, then it takes forever. Task: given a probability π, which is known up to a multiplicative constant C, how do we draw a sample from π or compute Eπ[X] for X∼π? Well…. we use MCMC.