Independent MH

Gamma distribution

Robert and Casella (2013) presents the following algorithm:

If there exists a constant MM such that

f(x)Mg(x),xsupp  f,f(x) \le Mg(x)\,,\qquad \forall x\in \mathrm{supp}\;f\,,

the algorithm produces a uniformly ergodic chain (Theorem), and the expected acceptance probability associated with the algorithm is at least 1/M1/M when the chain is stationary, and in that sense, the IMH is more efficient than the Accept-Reject algorithm.

Let's illustrate this algorithm with Ga(α,1){\mathcal G}a(\alpha, 1). We have introduced how to sample from Gamma distribution via Accept-Reject algorithm in Special Distributions, and it is straightforward to get the Gamma Metropolis-Hastings based on the ratio of f/gf/g,

And we can implement this algorithm with the Julia code:

To sample Ga(2.43,1){\mathcal G}a(2.43, 1), and estimate Ef(X2)=2.43+2.432=8.33\mathrm{E}_f(X^2)=2.43+2.43^2=8.33.

Logistic Regression

We observe (xi,yi),i=1,,n(x_i,y_i),i=1,\ldots,n according to the model

YiBernoulli(p(xi)),p(x)=exp(α+βx)1+exp(α+βx).Y_i\sim\mathrm{Bernoulli}(p(x_i))\,,\qquad p(x) = \frac{\exp(\alpha+\beta x)}{1+\exp(\alpha+\beta x)}\,.

The likelihood is

L(α,βy)i=1n(exp(α+βxi)1+exp(α+βxi))yi(11+exp(α+βxi))1yiL(\alpha,\beta\mid \mathbf y) \propto \prod_{i=1}^n \Big(\frac{\exp(\alpha+\beta x_i)}{1+\exp(\alpha+\beta x_i)}\Big)^{y_i}\Big(\frac{1}{1+\exp(\alpha+\beta x_i)}\Big)^{1-y_i}

and let π(eα)Exp(1/b)\pi(e^\alpha)\sim \mathrm{Exp}(1/b) and put a flat prior on β\beta, i.e.,

πα(αb)πβ(b)=1beeα/bdeαdβ=1beαeeα/bdαdβ.\pi_\alpha(\alpha\mid b)\pi_\beta(b) = \frac 1b e^{-e^\alpha/b}de^\alpha d\beta=\frac 1b e^\alpha e^{-e^\alpha/b}d\alpha d\beta\,.

Note that

E[α]=αbeαeeα/bdα=0logw1bew/b=logbγ,\begin{aligned} \mathrm{E}[\alpha] &= \int_{-\infty}^\infty \frac{\alpha}{b}e^\alpha e^{-e^\alpha/b}d\alpha\\ &=\int_0^\infty \log w\frac 1b e^{-w/b} \\ &=\log b -\gamma\,, \end{aligned}

where

γ=0exlogxdx\gamma = -\int_0^\infty e^{-x}\log xdx

is the Euler's Constant.

Choose the data-dependent value that makes Eα=α^\mathrm{E}\alpha=\hat\alpha, where α^\hat \alpha is the MLE of α\alpha, so b^=exp(α^+γ)\hat b=\exp(\hat \alpha+\gamma).

We can use MLE to estimate the coefficient in the logistical model (see my post for the derivation), and the following Julia code can help us fit the model quickly.

The estimates of the parameters are α^=15.0479,β^=0.232163\hat\alpha=15.0479, \hat\beta=-0.232163 and σ^β=0.108137\hat\sigma_\beta = 0.108137.

Wrote the following Julia code to implement the independent MH algorithm,

Saddlepoint tail area approximation

If K(τ)=log(Eexp(τX))K(\tau) = \log({\mathbb E}\exp(\tau X)) is the cumulant generating function, solving the saddlepoint equation K(τ)=xK'(\tau)=x yields the saddlepoint. For noncentral chi squared random variable XX, the moment generating function is

ϕX(t)=exp(2λt/(12t))(12t)p/2,\phi_X(t) = \frac{\exp(2\lambda t/(1-2t))}{(1-2t)^{p/2}}\,,

where pp is the number of degrees of freedom and λ\lambda is the noncentrality parameter, and its saddlepoint is

τ^(x)=p+2xp2+8λx4x.\hat\tau(x) = \frac{-p+2x-\sqrt{p^2+8\lambda x}}{4x}\,.

The saddlepoint can be used to approximate the tail area of a distribution. We have the approximation

P(Xˉ>a)=a(n2πKX(τ^(x)))1/2exp{n[KX(τ^(x))τ^(x)x]}dx=τ^(a)1/2(n2π)1/2[KX(t)]1/2exp{n[KX(t)tKX(t)]}dt1mi=1mI[Zi>τ^(a)],\begin{aligned} P(\bar X>a) &= \int_a^\infty \Big(\frac{n}{2\pi K_X''(\hat\tau(x))}\Big)^{1/2}\exp\{n[K_X(\hat\tau(x))-\hat\tau(x)x]\}dx\\ &= \int_{\hat\tau(a)}^{1/2} \Big(\frac{n}{2\pi}\Big)^{1/2}[K_X''(t)]^{1/2}\exp\{n[K_X(t)-tK_X'(t)]\}dt\\ &\approx \frac 1m\sum_{i=1}^m\mathbb{I}[Z_i>\hat \tau(a)]\,, \end{aligned}

where ZiZ_i is the sample from the saddlepoint distribution. Using a Taylor series approximation,

exp{n[KX(t)tKX(t)]}exp{nKX(0)t22},\exp\{n[K_X(t)-tK_X'(t)]\}\approx \exp\Big\{-nK''_X(0)\frac{t^2}{2}\Big\}\,,

so the instrumental density can be chosen as N(0,1nKX(0))N(0,\frac{1}{nK_X''(0)}), where

KX(t)=2[p(12t)+4λ](12t)3.K_X''(t)=\frac{2[p(1-2t) + 4\lambda]}{(1-2t)^3}\,.

I implemented the independent MH algorithm via the following code to produce random variables from the saddlepoint distribution.

I can successfully reproduce the results.

If n=1n=1,

Interval

1 - pchisq(x, 6, 9*2)

IMH

(36.225,)(36.225,\infty)

0.1000076

0.1037

(40.542,)(40.542,\infty)

0.05000061

0.0516

(49.333,)(49.333,\infty)

0.01000057

0.0114

For n=100n=100, note that if Xiχp2(λ)X_i\sim \chi^2_p(\lambda), then nXˉ=Xiχnp2(nλ)n\bar X=\sum X_i\sim \chi^2_{np}(n\lambda), then we can use the following R code to figure out the cutpoints.

Then use these cutpoints to estimate the probability by using samples produced from independent MH algorithm.

Interval

1 - pchisq(x*100, 6*100, 9*2*100)

IMH

(25.18054,)(25.18054,\infty)

0.10

0.1045

(25.52361,)(25.52361,\infty)

0.05

0.0516

(26.17395,)(26.17395,\infty)

0.01

0.0093

Last updated