Monte-Carlo
  • Introduction
  • AIS
  • Generate R.V.
    • Special Distribution
    • Copulas
    • Minimum of Two Exponential
  • Gibbs
    • Comparing two groups
    • Linear Regression
    • Simulation of Exp-Abs-xy
  • Markov Chain
  • MC Approximation
  • MC Integration
    • Rao-Blackwellization
  • MC Optimization
  • MCMC
    • MCMC diagnostics
  • Metropolis-Hastings
    • Metropolis
    • Independent MH
    • Random Walk MH
    • ARMS MH
  • PBMCMC
  • RJMCMC
  • Diagnosing Convergence
  • SMCTC
  • Tempering
    • Parallel Tempering
  • Misc
    • R vs. Julia
  • References
Powered by GitBook
On this page
  • Gamma distribution
  • Logistic Regression
  • Saddlepoint tail area approximation
  1. Metropolis-Hastings

Independent MH

PreviousMetropolisNextRandom Walk MH

Last updated 6 years ago

Gamma distribution

presents the following algorithm:

If there exists a constant MMM such that

f(x)≤Mg(x) ,∀x∈supp  f ,f(x) \le Mg(x)\,,\qquad \forall x\in \mathrm{supp}\;f\,,f(x)≤Mg(x),∀x∈suppf,

And we can implement this algorithm with the Julia code:

function mh_gamma(T = 100, alpha = 1.5)
    a = Int(floor(alpha))
    b = a/alpha
    x = ones(T+1) # initial value: 1
    for t = 1:T
        yt = rgamma_int(a, b)
        rt = (yt / x[t] * exp((x[t] - yt) / alpha))^(alpha-a)
        if rt >= 1
            x[t+1] = yt
        else
            u = rand()
            if u < rt
                x[t+1] = yt
            else
                x[t+1] = x[t]
            end
        end   
    end
    return(x)
end

To sample Ga(2.43,1){\mathcal G}a(2.43, 1)Ga(2.43,1), and estimate Ef(X2)=2.43+2.432=8.33\mathrm{E}_f(X^2)=2.43+2.43^2=8.33Ef​(X2)=2.43+2.432=8.33.

# comparison with accept-reject
res = mh_gamma(5000, 2.43)[2:end]
est = cumsum(res.^2) ./ collect(1:5000)

res2 = ones(5000)
for i = 1:5000
    res2[i] = rgamma(2.43, 1)
end
est2 = cumsum(res2.^2) ./ collect(1:5000)

using Plots
plot(est, label="Independent MH")
plot!(est2, label="Accept-Reject")
hline!([8.33], label="True value")

Logistic Regression

We observe (xi,yi),i=1,…,n(x_i,y_i),i=1,\ldots,n(xi​,yi​),i=1,…,n according to the model

Yi∼Bernoulli(p(xi)) ,p(x)=exp⁡(α+βx)1+exp⁡(α+βx) .Y_i\sim\mathrm{Bernoulli}(p(x_i))\,,\qquad p(x) = \frac{\exp(\alpha+\beta x)}{1+\exp(\alpha+\beta x)}\,.Yi​∼Bernoulli(p(xi​)),p(x)=1+exp(α+βx)exp(α+βx)​.

The likelihood is

L(α,β∣y)∝∏i=1n(exp⁡(α+βxi)1+exp⁡(α+βxi))yi(11+exp⁡(α+βxi))1−yiL(\alpha,\beta\mid \mathbf y) \propto \prod_{i=1}^n \Big(\frac{\exp(\alpha+\beta x_i)}{1+\exp(\alpha+\beta x_i)}\Big)^{y_i}\Big(\frac{1}{1+\exp(\alpha+\beta x_i)}\Big)^{1-y_i}L(α,β∣y)∝i=1∏n​(1+exp(α+βxi​)exp(α+βxi​)​)yi​(1+exp(α+βxi​)1​)1−yi​

and let π(eα)∼Exp(1/b)\pi(e^\alpha)\sim \mathrm{Exp}(1/b)π(eα)∼Exp(1/b) and put a flat prior on β\betaβ, i.e.,

πα(α∣b)πβ(b)=1be−eα/bdeαdβ=1beαe−eα/bdαdβ .\pi_\alpha(\alpha\mid b)\pi_\beta(b) = \frac 1b e^{-e^\alpha/b}de^\alpha d\beta=\frac 1b e^\alpha e^{-e^\alpha/b}d\alpha d\beta\,.πα​(α∣b)πβ​(b)=b1​e−eα/bdeαdβ=b1​eαe−eα/bdαdβ.

Note that

E[α]=∫−∞∞αbeαe−eα/bdα=∫0∞log⁡w1be−w/b=log⁡b−γ ,\begin{aligned} \mathrm{E}[\alpha] &= \int_{-\infty}^\infty \frac{\alpha}{b}e^\alpha e^{-e^\alpha/b}d\alpha\\ &=\int_0^\infty \log w\frac 1b e^{-w/b} \\ &=\log b -\gamma\,, \end{aligned}E[α]​=∫−∞∞​bα​eαe−eα/bdα=∫0∞​logwb1​e−w/b=logb−γ,​

where

γ=−∫0∞e−xlog⁡xdx\gamma = -\int_0^\infty e^{-x}\log xdxγ=−∫0∞​e−xlogxdx

Choose the data-dependent value that makes Eα=α^\mathrm{E}\alpha=\hat\alphaEα=α^, where α^\hat \alphaα^ is the MLE of α\alphaα, so b^=exp⁡(α^+γ)\hat b=\exp(\hat \alpha+\gamma)b^=exp(α^+γ).

using DataFrames, GLM, Plots

temp = [53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81]
failure = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0]
data = DataFrame(temp = temp, failure = failure)
logit_fit = glm(@formula(failure ~ temp), data, Binomial(), LogitLink())
plot(temp, predict(logit_fit), legend = false, xlabel = "Temperature", ylab = "Probability")
scatter!(temp, predict(logit_fit))

The estimates of the parameters are α^=15.0479,β^=−0.232163\hat\alpha=15.0479, \hat\beta=-0.232163α^=15.0479,β^​=−0.232163 and σ^β=0.108137\hat\sigma_\beta = 0.108137σ^β​=0.108137.

Wrote the following Julia code to implement the independent MH algorithm,

## metropolis-hastings
using Distributions
γ = 0.57721
function ll(α::Float64, β::Float64)
    a = exp.(α .+ β*temp)
    return prod( (a ./ (1 .+ a) ).^failure .* (1 ./ (1 .+ a)).^(1 .- failure) )
end
function mh_logit(T::Int, α_hat::Float64, β_hat::Float64, σ_hat::Float64)
    φ = Normal(β_hat, σ_hat)
    π = Exponential(exp(α_hat+γ))
    Α = ones(T)
    Β = ones(T)
    for t = 1:T-1
        α = log(rand(π))
        β = rand(φ)
        r = ( ll(α, β) / ll(Α[t], Β[t]) ) * ( pdf(φ, β) / pdf(φ, Β[t]) )
        if rand() < r
            Α[t+1] = α
            Β[t+1] = β
        else
            Α[t+1] = Α[t]
            Β[t+1] = Β[t]
        end
    end
    return Α, Β
end

Α, Β = mh_logit(10000, 15.04, -0.233, 0.108)

p1 = plot(Α, legend = false, xlab = "Intercept")
hline!([15.04])

p2 = plot(Β, legend = false, xlab = "Slope")
hline!([-0.233])

plot(p1, p2, layout = (1,2))

Saddlepoint tail area approximation

ϕX(t)=exp⁡(2λt/(1−2t))(1−2t)p/2 ,\phi_X(t) = \frac{\exp(2\lambda t/(1-2t))}{(1-2t)^{p/2}}\,,ϕX​(t)=(1−2t)p/2exp(2λt/(1−2t))​,

where ppp is the number of degrees of freedom and λ\lambdaλ is the noncentrality parameter, and its saddlepoint is

τ^(x)=−p+2x−p2+8λx4x .\hat\tau(x) = \frac{-p+2x-\sqrt{p^2+8\lambda x}}{4x}\,.τ^(x)=4x−p+2x−p2+8λx​​.

The saddlepoint can be used to approximate the tail area of a distribution. We have the approximation

P(Xˉ>a)=∫a∞(n2πKX′′(τ^(x)))1/2exp⁡{n[KX(τ^(x))−τ^(x)x]}dx=∫τ^(a)1/2(n2π)1/2[KX′′(t)]1/2exp⁡{n[KX(t)−tKX′(t)]}dt≈1m∑i=1mI[Zi>τ^(a)] ,\begin{aligned} P(\bar X>a) &= \int_a^\infty \Big(\frac{n}{2\pi K_X''(\hat\tau(x))}\Big)^{1/2}\exp\{n[K_X(\hat\tau(x))-\hat\tau(x)x]\}dx\\ &= \int_{\hat\tau(a)}^{1/2} \Big(\frac{n}{2\pi}\Big)^{1/2}[K_X''(t)]^{1/2}\exp\{n[K_X(t)-tK_X'(t)]\}dt\\ &\approx \frac 1m\sum_{i=1}^m\mathbb{I}[Z_i>\hat \tau(a)]\,, \end{aligned}P(Xˉ>a)​=∫a∞​(2πKX′′​(τ^(x))n​)1/2exp{n[KX​(τ^(x))−τ^(x)x]}dx=∫τ^(a)1/2​(2πn​)1/2[KX′′​(t)]1/2exp{n[KX​(t)−tKX′​(t)]}dt≈m1​i=1∑m​I[Zi​>τ^(a)],​

where ZiZ_iZi​ is the sample from the saddlepoint distribution. Using a Taylor series approximation,

exp⁡{n[KX(t)−tKX′(t)]}≈exp⁡{−nKX′′(0)t22} ,\exp\{n[K_X(t)-tK_X'(t)]\}\approx \exp\Big\{-nK''_X(0)\frac{t^2}{2}\Big\}\,,exp{n[KX​(t)−tKX′​(t)]}≈exp{−nKX′′​(0)2t2​},

so the instrumental density can be chosen as N(0,1nKX′′(0))N(0,\frac{1}{nK_X''(0)})N(0,nKX′′​(0)1​), where

KX′′(t)=2[p(1−2t)+4λ](1−2t)3 .K_X''(t)=\frac{2[p(1-2t) + 4\lambda]}{(1-2t)^3}\,.KX′′​(t)=(1−2t)32[p(1−2t)+4λ]​.

I implemented the independent MH algorithm via the following code to produce random variables from the saddlepoint distribution.

using Distributions

p = 6
λ = 9

function K(t)
    return 2λ*t / (1-2t) - p / 2 * log(1-2t)
end

function D1K(t)
    return ( 4λ*t ) / (1-2t)^2 + ( 2λ + p ) / ( 1 - 2t )
end

function D2K(t)
    return 2(p*(1-2t) + 4λ) / (1-2t)^3
end

function logf(n, t)
    return n * (K(t) - t*D1K(t)) + 0.5log(D2K(t))
end

function logg(n, t)
    return -1n * D2K(0) * t^2/2
end

function mh_saddle(T::Int = 10000; n::Int = 1)
    Z = zeros(T)
    g = Normal(0, 1 / sqrt(n*D2K(0)))
    for t = 1:T-1
        z = rand(g)
        logr = logf(n, z) - logf(n, Z[t]) + logg(n, Z[t]) - logg(n, z)
        if log(rand()) < logr 
            Z[t+1] = z
        else
            Z[t+1] = Z[t]
        end
    end
    return Z
end

Z = mh_saddle(n = 1)

function tau(x)
    return ( -1p + 2x - sqrt(p^2 + 8λ * x) ) / (4x)
end

println(sum(Z .> tau(36.225)) / 10000)
println(sum(Z .> tau(40.542)) / 10000)
println(sum(Z .> tau(49.333)) / 10000)

I can successfully reproduce the results.

If n=1n=1n=1,

Interval

1 - pchisq(x, 6, 9*2)

IMH

0.1000076

0.1037

0.05000061

0.0516

0.01000057

0.0114

For n=100n=100n=100, note that if Xi∼χp2(λ)X_i\sim \chi^2_p(\lambda)Xi​∼χp2​(λ), then nXˉ=∑Xi∼χnp2(nλ)n\bar X=\sum X_i\sim \chi^2_{np}(n\lambda)nXˉ=∑Xi​∼χnp2​(nλ), then we can use the following R code to figure out the cutpoints.

qchisq(0.90, 600, 1800)/100
qchisq(0.95, 600, 1800)/100
qchisq(0.99, 600, 1800)/100

Then use these cutpoints to estimate the probability by using samples produced from independent MH algorithm.

Interval

1 - pchisq(x*100, 6*100, 9*2*100)

IMH

0.10

0.1045

0.05

0.0516

0.01

0.0093

the algorithm produces a uniformly ergodic chain (), and the expected acceptance probability associated with the algorithm is at least 1/M1/M1/M when the chain is stationary, and in that sense, the IMH is more efficient than the Accept-Reject algorithm.

Let's illustrate this algorithm with Ga(α,1){\mathcal G}a(\alpha, 1)Ga(α,1). We have introduced how to sample from Gamma distribution via Accept-Reject algorithm in , and it is straightforward to get the Gamma Metropolis-Hastings based on the ratio of f/gf/gf/g,

is the .

We can use MLE to estimate the coefficient in the logistical model (see for the derivation), and the following Julia code can help us fit the model quickly.

If K(τ)=log⁡(Eexp⁡(τX))K(\tau) = \log({\mathbb E}\exp(\tau X))K(τ)=log(Eexp(τX)) is the , solving the saddlepoint equation K′(τ)=xK'(\tau)=xK′(τ)=x yields the saddlepoint. For noncentral chi squared random variable XXX, the moment generating function is

(36.225,∞)(36.225,\infty)(36.225,∞)
(40.542,∞)(40.542,\infty)(40.542,∞)
(49.333,∞)(49.333,\infty)(49.333,∞)
(25.18054,∞)(25.18054,\infty)(25.18054,∞)
(25.52361,∞)(25.52361,\infty)(25.52361,∞)
(26.17395,∞)(26.17395,\infty)(26.17395,∞)
Theorem
Special Distributions
Euler's Constant
my post
cumulant generating function
Robert and Casella (2013)