Independent MH

Gamma distribution

Robert and Casella (2013)arrow-up-right presents the following algorithm:

If there exists a constant MM such that

f(x)Mg(x),xsupp  f,f(x) \le Mg(x)\,,\qquad \forall x\in \mathrm{supp}\;f\,,

the algorithm produces a uniformly ergodic chain (Theoremarrow-up-right), and the expected acceptance probability associated with the algorithm is at least 1/M1/M when the chain is stationary, and in that sense, the IMH is more efficient than the Accept-Reject algorithm.

Let's illustrate this algorithm with Ga(α,1){\mathcal G}a(\alpha, 1). We have introduced how to sample from Gamma distribution via Accept-Reject algorithm in Special Distributionsarrow-up-right, and it is straightforward to get the Gamma Metropolis-Hastings based on the ratio of f/gf/g,

And we can implement this algorithm with the Julia code:

To sample Ga(2.43,1){\mathcal G}a(2.43, 1), and estimate Ef(X2)=2.43+2.432=8.33\mathrm{E}_f(X^2)=2.43+2.43^2=8.33.

Logistic Regression

We observe (xi,yi),i=1,,n(x_i,y_i),i=1,\ldots,n according to the model

YiBernoulli(p(xi)),p(x)=exp(α+βx)1+exp(α+βx).Y_i\sim\mathrm{Bernoulli}(p(x_i))\,,\qquad p(x) = \frac{\exp(\alpha+\beta x)}{1+\exp(\alpha+\beta x)}\,.

The likelihood is

L(α,βy)i=1n(exp(α+βxi)1+exp(α+βxi))yi(11+exp(α+βxi))1yiL(\alpha,\beta\mid \mathbf y) \propto \prod_{i=1}^n \Big(\frac{\exp(\alpha+\beta x_i)}{1+\exp(\alpha+\beta x_i)}\Big)^{y_i}\Big(\frac{1}{1+\exp(\alpha+\beta x_i)}\Big)^{1-y_i}

and let π(eα)Exp(1/b)\pi(e^\alpha)\sim \mathrm{Exp}(1/b) and put a flat prior on β\beta, i.e.,

πα(αb)πβ(b)=1beeα/bdeαdβ=1beαeeα/bdαdβ.\pi_\alpha(\alpha\mid b)\pi_\beta(b) = \frac 1b e^{-e^\alpha/b}de^\alpha d\beta=\frac 1b e^\alpha e^{-e^\alpha/b}d\alpha d\beta\,.

Note that

E[α]=αbeαeeα/bdα=0logw1bew/b=logbγ,\begin{aligned} \mathrm{E}[\alpha] &= \int_{-\infty}^\infty \frac{\alpha}{b}e^\alpha e^{-e^\alpha/b}d\alpha\\ &=\int_0^\infty \log w\frac 1b e^{-w/b} \\ &=\log b -\gamma\,, \end{aligned}

where

γ=0exlogxdx\gamma = -\int_0^\infty e^{-x}\log xdx

is the Euler's Constantarrow-up-right.

Choose the data-dependent value that makes Eα=α^\mathrm{E}\alpha=\hat\alpha, where α^\hat \alpha is the MLE of α\alpha, so b^=exp(α^+γ)\hat b=\exp(\hat \alpha+\gamma).

We can use MLE to estimate the coefficient in the logistical model (see my postarrow-up-right for the derivation), and the following Julia code can help us fit the model quickly.

The estimates of the parameters are α^=15.0479,β^=0.232163\hat\alpha=15.0479, \hat\beta=-0.232163 and σ^β=0.108137\hat\sigma_\beta = 0.108137.

Wrote the following Julia code to implement the independent MH algorithm,

Saddlepoint tail area approximation

If K(τ)=log(Eexp(τX))K(\tau) = \log({\mathbb E}\exp(\tau X)) is the cumulant generating functionarrow-up-right, solving the saddlepoint equation K(τ)=xK'(\tau)=x yields the saddlepoint. For noncentral chi squared random variable XX, the moment generating function is

ϕX(t)=exp(2λt/(12t))(12t)p/2,\phi_X(t) = \frac{\exp(2\lambda t/(1-2t))}{(1-2t)^{p/2}}\,,

where pp is the number of degrees of freedom and λ\lambda is the noncentrality parameter, and its saddlepoint is

τ^(x)=p+2xp2+8λx4x.\hat\tau(x) = \frac{-p+2x-\sqrt{p^2+8\lambda x}}{4x}\,.

The saddlepoint can be used to approximate the tail area of a distribution. We have the approximation

P(Xˉ>a)=a(n2πKX(τ^(x)))1/2exp{n[KX(τ^(x))τ^(x)x]}dx=τ^(a)1/2(n2π)1/2[KX(t)]1/2exp{n[KX(t)tKX(t)]}dt1mi=1mI[Zi>τ^(a)],\begin{aligned} P(\bar X>a) &= \int_a^\infty \Big(\frac{n}{2\pi K_X''(\hat\tau(x))}\Big)^{1/2}\exp\{n[K_X(\hat\tau(x))-\hat\tau(x)x]\}dx\\ &= \int_{\hat\tau(a)}^{1/2} \Big(\frac{n}{2\pi}\Big)^{1/2}[K_X''(t)]^{1/2}\exp\{n[K_X(t)-tK_X'(t)]\}dt\\ &\approx \frac 1m\sum_{i=1}^m\mathbb{I}[Z_i>\hat \tau(a)]\,, \end{aligned}

where ZiZ_i is the sample from the saddlepoint distribution. Using a Taylor series approximation,

exp{n[KX(t)tKX(t)]}exp{nKX(0)t22},\exp\{n[K_X(t)-tK_X'(t)]\}\approx \exp\Big\{-nK''_X(0)\frac{t^2}{2}\Big\}\,,

so the instrumental density can be chosen as N(0,1nKX(0))N(0,\frac{1}{nK_X''(0)}), where

KX(t)=2[p(12t)+4λ](12t)3.K_X''(t)=\frac{2[p(1-2t) + 4\lambda]}{(1-2t)^3}\,.

I implemented the independent MH algorithm via the following code to produce random variables from the saddlepoint distribution.

I can successfully reproduce the results.

If n=1n=1,

Interval

1 - pchisq(x, 6, 9*2)

IMH

(36.225,)(36.225,\infty)

0.1000076

0.1037

(40.542,)(40.542,\infty)

0.05000061

0.0516

(49.333,)(49.333,\infty)

0.01000057

0.0114

For n=100n=100, note that if Xiχp2(λ)X_i\sim \chi^2_p(\lambda), then nXˉ=Xiχnp2(nλ)n\bar X=\sum X_i\sim \chi^2_{np}(n\lambda), then we can use the following R code to figure out the cutpoints.

Then use these cutpoints to estimate the probability by using samples produced from independent MH algorithm.

Interval

1 - pchisq(x*100, 6*100, 9*2*100)

IMH

(25.18054,)(25.18054,\infty)

0.10

0.1045

(25.52361,)(25.52361,\infty)

0.05

0.0516

(26.17395,)(26.17395,\infty)

0.01

0.0093

Last updated