Metropolis Hastings Algorithm

Definition (Proposal kernel)

In the MH algorithm we use a proposal kernel which we define as T()T(\cdot|\cdot) where T()\mboxisthetransitionmatrixonSiT(yx)\mboxistheonesteptransitionprobabilityT(x)\mboxisarowinT()\begin{align*} &T(\cdot|\cdot)\mbox{ is the transition matrix on }S_{i}\\\\ &T(y|x)\mbox{ is the one-step transition probability}\\\\ &T(\cdot|x)\mbox{ is a row in }T(\cdot|\cdot) \end{align*}

Definition (Acceptance rate)

In the MH algorithm we define the acceptance rate, rr, as the threshold function that determines whether a state get’s accepted or not: r(xt1,y)=min{1,π(y)T(xt1y)π(xt1)T(yxt1)}r(x_{t-1},y)=\min\left\{1, \frac{\pi(y)T(x_{t-1}|y)}{\pi(x_{t-1})T(y|x_{t-1})} \right\}where xt1x_{t-1}is the previous state, yy is the proposed state, and T()T(\cdot|\cdot) is the .

Goal

To a create a MC with π()\pi(\cdot) as its invariant distribution. The MH algorithm needs a proposal kernel, T()T(\cdot|\cdot) which is user-specified.

Algorithm

At time t=1t=1, generate X1X_{1} (pick random iSi\in S and set X1=iX_{1}=i) from an arbitrary distribution. For t2t\ge2, given Xt1=xt1X_{t-1}=x_{t-1},

  • Draw y from the distribution T(xt1)T(\cdot|x_{t-1}), and compute the acceptance rate: r(xt1,y)=min{1,π(y)T(xt1y)π(xt1)T(yxt1)}r(x_{t-1},y)=\min\left\{1, \frac{\pi(y)T(x_{t-1}|y)}{\pi(x_{t-1})T(y|x_{t-1})} \right\}
  • Draw Ut\mboxUnif(0,1)U_{t}\sim\mbox{Unif}(0,1), and Xt={y,\mboxifUtr(xt1,y)xt1,\mboxotherwiseX_{t}=\begin{cases} y, &\mbox{if }U_{t}\le r(x_{t-1},y) \\ x_{t-1}, &\mbox{otherwise} \end{cases}
  • The transition kernel for MH is for yxy\not=x, p(yx)=T(yx)min{1,π(y)T(xy)π(x)T(yx)}p(y|x)=T(y|x)\min\left\{1,\frac{\pi(y)T(x|y)}{\pi(x)T(y|x)}\right\}
  • Then the detailed balance hold since π(x)p(yx)=π(x)T(yx)min{1,π(y)T(xy)π(x)T(yx)}=min{π(y)T(xy),π(x)T(yx)}=π(y)p(xy)\begin{align*}\pi(x)p(y|x)&=\pi(x)T(y|x) \min\left\{1,\frac{\pi(y)T(x|y)}{\pi(x)T(y|x)}\right\}\\\\&= \min\left\{\pi(y)T(x|y),\pi(x)T(y|x)\right\}\\\\&=\pi(y)p(x|y)\end{align*} or
def MH_algo(state_space, T, pi):
    x = []
    x.append(state_space[random.randint(len(state_space))])
    db = False
    t = 0
    while(not db):
        t += 1
        x_tm1 = x[t-1]
        possible_y = T[:][x_tm1]
        # Step 1
        y = possible_y[random.randint(len(possible_y))]
        while(y <= 0):
            y = possible_y[random.randint(len(possible_y))]
        r = min(1,float(pi(y)*T[x_tm1][y]/float(pi(x_tm1)T[y][x_tm1]))
        # Step 2
        Ut = random.random()
        x.append(y if Ut <= r else x_tm1)
    # Step 3
    if(y is not x):
        p[y][:]=T[y][:]*min(1,pi(y)*T[:][y]/float(pi(state_space[:])*T[y][:]))
    # Step 4
        # idk if this is right ibr  
        

Remark

π~=πC, π(y)π(xt1)=π~(y)π~(xt1)\tilde\pi=\frac{\pi}{C}, \ \frac{\pi(y)}{\pi(x_{t-1})}=\frac{\tilde\pi(y)}{\tilde\pi(x_{t-1})}

Theorem (Theoretical Guarantees for Countable State Space)

Assume π()\pi(\cdot) is a distribution over a countable space.

  1. If the transition matrix PP is irreducible, then w.p.1 for any bounded function f:SRf:S\to\mathbb{R}, limn1Nn=1Nf(Xn)=iSf(i)π(i)\lim_{n\to\infty} \frac{1}{N}\sum\limits_{n=1}^{N}f(X_{n})=\sum\limits_{i\in S}f(i)\pi(i)i.e. Ergodic Theorem II holds true.
  2. If in addition PP is aperiodic, then limNsupjSP(XN=j)πj=0\lim_{N\to\infty}\sup_{j\in S}|P(X_{N}=j)-\pi_{j}|=0i.e. Convergence to Equilibrium holds true

Theorem (Symmetric kernel)

When applying the MH algorithm, if the is symmetric, i.e., T(yx)=T(xy)T(y|x)=T(x|y)then the , rr, can be simplified to r(x,y)=min{1,π(y)π(x)}r(x,y)=\min\left\{1,\frac{\pi(y)}{\pi(x)}\right\}

Remark

  1. This implies we only need TT, the proposal kernel, algorithmically.
  2. Also, if π(y)π(x)\pi(y)\ge\pi(x), yy will always be accepted.

Linked from