Monte-Carlo
  • Introduction
  • AIS
  • Generate R.V.
    • Special Distribution
    • Copulas
    • Minimum of Two Exponential
  • Gibbs
    • Comparing two groups
    • Linear Regression
    • Simulation of Exp-Abs-xy
  • Markov Chain
  • MC Approximation
  • MC Integration
    • Rao-Blackwellization
  • MC Optimization
  • MCMC
    • MCMC diagnostics
  • Metropolis-Hastings
    • Metropolis
    • Independent MH
    • Random Walk MH
    • ARMS MH
  • PBMCMC
  • RJMCMC
  • Diagnosing Convergence
  • SMCTC
  • Tempering
    • Parallel Tempering
  • Misc
    • R vs. Julia
  • References
Powered by GitBook
On this page
  • Gamma distribution
  • Logistic Regression
  • Saddlepoint tail area approximation
  1. Metropolis-Hastings

Independent MH

PreviousMetropolisNextRandom Walk MH

Last updated 6 years ago

Gamma distribution

presents the following algorithm:

And we can implement this algorithm with the Julia code:

function mh_gamma(T = 100, alpha = 1.5)
    a = Int(floor(alpha))
    b = a/alpha
    x = ones(T+1) # initial value: 1
    for t = 1:T
        yt = rgamma_int(a, b)
        rt = (yt / x[t] * exp((x[t] - yt) / alpha))^(alpha-a)
        if rt >= 1
            x[t+1] = yt
        else
            u = rand()
            if u < rt
                x[t+1] = yt
            else
                x[t+1] = x[t]
            end
        end   
    end
    return(x)
end
# comparison with accept-reject
res = mh_gamma(5000, 2.43)[2:end]
est = cumsum(res.^2) ./ collect(1:5000)

res2 = ones(5000)
for i = 1:5000
    res2[i] = rgamma(2.43, 1)
end
est2 = cumsum(res2.^2) ./ collect(1:5000)

using Plots
plot(est, label="Independent MH")
plot!(est2, label="Accept-Reject")
hline!([8.33], label="True value")

Logistic Regression

The likelihood is

Note that

where

using DataFrames, GLM, Plots

temp = [53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81]
failure = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0]
data = DataFrame(temp = temp, failure = failure)
logit_fit = glm(@formula(failure ~ temp), data, Binomial(), LogitLink())
plot(temp, predict(logit_fit), legend = false, xlabel = "Temperature", ylab = "Probability")
scatter!(temp, predict(logit_fit))

Wrote the following Julia code to implement the independent MH algorithm,

## metropolis-hastings
using Distributions
γ = 0.57721
function ll(α::Float64, β::Float64)
    a = exp.(α .+ β*temp)
    return prod( (a ./ (1 .+ a) ).^failure .* (1 ./ (1 .+ a)).^(1 .- failure) )
end
function mh_logit(T::Int, α_hat::Float64, β_hat::Float64, σ_hat::Float64)
    φ = Normal(β_hat, σ_hat)
    π = Exponential(exp(α_hat+γ))
    Α = ones(T)
    Β = ones(T)
    for t = 1:T-1
        α = log(rand(π))
        β = rand(φ)
        r = ( ll(α, β) / ll(Α[t], Β[t]) ) * ( pdf(φ, β) / pdf(φ, Β[t]) )
        if rand() < r
            Α[t+1] = α
            Β[t+1] = β
        else
            Α[t+1] = Α[t]
            Β[t+1] = Β[t]
        end
    end
    return Α, Β
end

Α, Β = mh_logit(10000, 15.04, -0.233, 0.108)

p1 = plot(Α, legend = false, xlab = "Intercept")
hline!([15.04])

p2 = plot(Β, legend = false, xlab = "Slope")
hline!([-0.233])

plot(p1, p2, layout = (1,2))

Saddlepoint tail area approximation

The saddlepoint can be used to approximate the tail area of a distribution. We have the approximation

I implemented the independent MH algorithm via the following code to produce random variables from the saddlepoint distribution.

using Distributions

p = 6
λ = 9

function K(t)
    return 2λ*t / (1-2t) - p / 2 * log(1-2t)
end

function D1K(t)
    return ( 4λ*t ) / (1-2t)^2 + ( 2λ + p ) / ( 1 - 2t )
end

function D2K(t)
    return 2(p*(1-2t) + 4λ) / (1-2t)^3
end

function logf(n, t)
    return n * (K(t) - t*D1K(t)) + 0.5log(D2K(t))
end

function logg(n, t)
    return -1n * D2K(0) * t^2/2
end

function mh_saddle(T::Int = 10000; n::Int = 1)
    Z = zeros(T)
    g = Normal(0, 1 / sqrt(n*D2K(0)))
    for t = 1:T-1
        z = rand(g)
        logr = logf(n, z) - logf(n, Z[t]) + logg(n, Z[t]) - logg(n, z)
        if log(rand()) < logr 
            Z[t+1] = z
        else
            Z[t+1] = Z[t]
        end
    end
    return Z
end

Z = mh_saddle(n = 1)

function tau(x)
    return ( -1p + 2x - sqrt(p^2 + 8λ * x) ) / (4x)
end

println(sum(Z .> tau(36.225)) / 10000)
println(sum(Z .> tau(40.542)) / 10000)
println(sum(Z .> tau(49.333)) / 10000)

I can successfully reproduce the results.

Interval

1 - pchisq(x, 6, 9*2)

IMH

0.1000076

0.1037

0.05000061

0.0516

0.01000057

0.0114

qchisq(0.90, 600, 1800)/100
qchisq(0.95, 600, 1800)/100
qchisq(0.99, 600, 1800)/100

Then use these cutpoints to estimate the probability by using samples produced from independent MH algorithm.

Interval

1 - pchisq(x*100, 6*100, 9*2*100)

IMH

0.10

0.1045

0.05

0.0516

0.01

0.0093

If there exists a constant MMM such that

f(x)≤Mg(x) ,∀x∈supp  f ,f(x) \le Mg(x)\,,\qquad \forall x\in \mathrm{supp}\;f\,,f(x)≤Mg(x),∀x∈suppf,

the algorithm produces a uniformly ergodic chain (), and the expected acceptance probability associated with the algorithm is at least 1/M1/M1/M when the chain is stationary, and in that sense, the IMH is more efficient than the Accept-Reject algorithm.

Let's illustrate this algorithm with Ga(α,1){\mathcal G}a(\alpha, 1)Ga(α,1). We have introduced how to sample from Gamma distribution via Accept-Reject algorithm in , and it is straightforward to get the Gamma Metropolis-Hastings based on the ratio of f/gf/gf/g,

To sample Ga(2.43,1){\mathcal G}a(2.43, 1)Ga(2.43,1), and estimate Ef(X2)=2.43+2.432=8.33\mathrm{E}_f(X^2)=2.43+2.43^2=8.33Ef​(X2)=2.43+2.432=8.33.

We observe (xi,yi),i=1,…,n(x_i,y_i),i=1,\ldots,n(xi​,yi​),i=1,…,n according to the model

Yi∼Bernoulli(p(xi)) ,p(x)=exp⁡(α+βx)1+exp⁡(α+βx) .Y_i\sim\mathrm{Bernoulli}(p(x_i))\,,\qquad p(x) = \frac{\exp(\alpha+\beta x)}{1+\exp(\alpha+\beta x)}\,.Yi​∼Bernoulli(p(xi​)),p(x)=1+exp(α+βx)exp(α+βx)​.
L(α,β∣y)∝∏i=1n(exp⁡(α+βxi)1+exp⁡(α+βxi))yi(11+exp⁡(α+βxi))1−yiL(\alpha,\beta\mid \mathbf y) \propto \prod_{i=1}^n \Big(\frac{\exp(\alpha+\beta x_i)}{1+\exp(\alpha+\beta x_i)}\Big)^{y_i}\Big(\frac{1}{1+\exp(\alpha+\beta x_i)}\Big)^{1-y_i}L(α,β∣y)∝i=1∏n​(1+exp(α+βxi​)exp(α+βxi​)​)yi​(1+exp(α+βxi​)1​)1−yi​

and let π(eα)∼Exp(1/b)\pi(e^\alpha)\sim \mathrm{Exp}(1/b)π(eα)∼Exp(1/b) and put a flat prior on β\betaβ, i.e.,

πα(α∣b)πβ(b)=1be−eα/bdeαdβ=1beαe−eα/bdαdβ .\pi_\alpha(\alpha\mid b)\pi_\beta(b) = \frac 1b e^{-e^\alpha/b}de^\alpha d\beta=\frac 1b e^\alpha e^{-e^\alpha/b}d\alpha d\beta\,.πα​(α∣b)πβ​(b)=b1​e−eα/bdeαdβ=b1​eαe−eα/bdαdβ.
E[α]=∫−∞∞αbeαe−eα/bdα=∫0∞log⁡w1be−w/b=log⁡b−γ ,\begin{aligned} \mathrm{E}[\alpha] &= \int_{-\infty}^\infty \frac{\alpha}{b}e^\alpha e^{-e^\alpha/b}d\alpha\\ &=\int_0^\infty \log w\frac 1b e^{-w/b} \\ &=\log b -\gamma\,, \end{aligned}E[α]​=∫−∞∞​bα​eαe−eα/bdα=∫0∞​logwb1​e−w/b=logb−γ,​
γ=−∫0∞e−xlog⁡xdx\gamma = -\int_0^\infty e^{-x}\log xdxγ=−∫0∞​e−xlogxdx

is the .

Choose the data-dependent value that makes Eα=α^\mathrm{E}\alpha=\hat\alphaEα=α^, where α^\hat \alphaα^ is the MLE of α\alphaα, so b^=exp⁡(α^+γ)\hat b=\exp(\hat \alpha+\gamma)b^=exp(α^+γ).

We can use MLE to estimate the coefficient in the logistical model (see for the derivation), and the following Julia code can help us fit the model quickly.

The estimates of the parameters are α^=15.0479,β^=−0.232163\hat\alpha=15.0479, \hat\beta=-0.232163α^=15.0479,β^​=−0.232163 and σ^β=0.108137\hat\sigma_\beta = 0.108137σ^β​=0.108137.

If K(τ)=log⁡(Eexp⁡(τX))K(\tau) = \log({\mathbb E}\exp(\tau X))K(τ)=log(Eexp(τX)) is the , solving the saddlepoint equation K′(τ)=xK'(\tau)=xK′(τ)=x yields the saddlepoint. For noncentral chi squared random variable XXX, the moment generating function is

ϕX(t)=exp⁡(2λt/(1−2t))(1−2t)p/2 ,\phi_X(t) = \frac{\exp(2\lambda t/(1-2t))}{(1-2t)^{p/2}}\,,ϕX​(t)=(1−2t)p/2exp(2λt/(1−2t))​,

where ppp is the number of degrees of freedom and λ\lambdaλ is the noncentrality parameter, and its saddlepoint is

τ^(x)=−p+2x−p2+8λx4x .\hat\tau(x) = \frac{-p+2x-\sqrt{p^2+8\lambda x}}{4x}\,.τ^(x)=4x−p+2x−p2+8λx​​.
P(Xˉ>a)=∫a∞(n2πKX′′(τ^(x)))1/2exp⁡{n[KX(τ^(x))−τ^(x)x]}dx=∫τ^(a)1/2(n2π)1/2[KX′′(t)]1/2exp⁡{n[KX(t)−tKX′(t)]}dt≈1m∑i=1mI[Zi>τ^(a)] ,\begin{aligned} P(\bar X>a) &= \int_a^\infty \Big(\frac{n}{2\pi K_X''(\hat\tau(x))}\Big)^{1/2}\exp\{n[K_X(\hat\tau(x))-\hat\tau(x)x]\}dx\\ &= \int_{\hat\tau(a)}^{1/2} \Big(\frac{n}{2\pi}\Big)^{1/2}[K_X''(t)]^{1/2}\exp\{n[K_X(t)-tK_X'(t)]\}dt\\ &\approx \frac 1m\sum_{i=1}^m\mathbb{I}[Z_i>\hat \tau(a)]\,, \end{aligned}P(Xˉ>a)​=∫a∞​(2πKX′′​(τ^(x))n​)1/2exp{n[KX​(τ^(x))−τ^(x)x]}dx=∫τ^(a)1/2​(2πn​)1/2[KX′′​(t)]1/2exp{n[KX​(t)−tKX′​(t)]}dt≈m1​i=1∑m​I[Zi​>τ^(a)],​

where ZiZ_iZi​ is the sample from the saddlepoint distribution. Using a Taylor series approximation,

exp⁡{n[KX(t)−tKX′(t)]}≈exp⁡{−nKX′′(0)t22} ,\exp\{n[K_X(t)-tK_X'(t)]\}\approx \exp\Big\{-nK''_X(0)\frac{t^2}{2}\Big\}\,,exp{n[KX​(t)−tKX′​(t)]}≈exp{−nKX′′​(0)2t2​},

so the instrumental density can be chosen as N(0,1nKX′′(0))N(0,\frac{1}{nK_X''(0)})N(0,nKX′′​(0)1​), where

KX′′(t)=2[p(1−2t)+4λ](1−2t)3 .K_X''(t)=\frac{2[p(1-2t) + 4\lambda]}{(1-2t)^3}\,.KX′′​(t)=(1−2t)32[p(1−2t)+4λ]​.

If n=1n=1n=1,

For n=100n=100n=100, note that if Xi∼χp2(λ)X_i\sim \chi^2_p(\lambda)Xi​∼χp2​(λ), then nXˉ=∑Xi∼χnp2(nλ)n\bar X=\sum X_i\sim \chi^2_{np}(n\lambda)nXˉ=∑Xi​∼χnp2​(nλ), then we can use the following R code to figure out the cutpoints.

(36.225,∞)(36.225,\infty)(36.225,∞)
(40.542,∞)(40.542,\infty)(40.542,∞)
(49.333,∞)(49.333,\infty)(49.333,∞)
(25.18054,∞)(25.18054,\infty)(25.18054,∞)
(25.52361,∞)(25.52361,\infty)(25.52361,∞)
(26.17395,∞)(26.17395,\infty)(26.17395,∞)
Euler's Constant
my post
Theorem
Special Distributions
cumulant generating function
Robert and Casella (2013)