Independent MH

Gamma distribution

Robert and Casella (2013) presents the following algorithm:
If there exists a constant
MM
such that
f(x)Mg(x),xsupp  f,f(x) \le Mg(x)\,,\qquad \forall x\in \mathrm{supp}\;f\,,
the algorithm produces a uniformly ergodic chain (Theorem), and the expected acceptance probability associated with the algorithm is at least
1/M1/M
when the chain is stationary, and in that sense, the IMH is more efficient than the Accept-Reject algorithm.
Let's illustrate this algorithm with
Ga(α,1){\mathcal G}a(\alpha, 1)
. We have introduced how to sample from Gamma distribution via Accept-Reject algorithm in Special Distributions, and it is straightforward to get the Gamma Metropolis-Hastings based on the ratio of
f/gf/g
,
And we can implement this algorithm with the Julia code:
1
function mh_gamma(T = 100, alpha = 1.5)
2
a = Int(floor(alpha))
3
b = a/alpha
4
x = ones(T+1) # initial value: 1
5
for t = 1:T
6
yt = rgamma_int(a, b)
7
rt = (yt / x[t] * exp((x[t] - yt) / alpha))^(alpha-a)
8
if rt >= 1
9
x[t+1] = yt
10
else
11
u = rand()
12
if u < rt
13
x[t+1] = yt
14
else
15
x[t+1] = x[t]
16
end
17
end
18
end
19
return(x)
20
end
Copied!
To sample
Ga(2.43,1){\mathcal G}a(2.43, 1)
, and estimate
Ef(X2)=2.43+2.432=8.33\mathrm{E}_f(X^2)=2.43+2.43^2=8.33
.
1
# comparison with accept-reject
2
res = mh_gamma(5000, 2.43)[2:end]
3
est = cumsum(res.^2) ./ collect(1:5000)
4
5
res2 = ones(5000)
6
for i = 1:5000
7
res2[i] = rgamma(2.43, 1)
8
end
9
est2 = cumsum(res2.^2) ./ collect(1:5000)
10
11
using Plots
12
plot(est, label="Independent MH")
13
plot!(est2, label="Accept-Reject")
14
hline!([8.33], label="True value")
Copied!

Logistic Regression

We observe
(xi,yi),i=1,,n(x_i,y_i),i=1,\ldots,n
according to the model
YiBernoulli(p(xi)),p(x)=exp(α+βx)1+exp(α+βx).Y_i\sim\mathrm{Bernoulli}(p(x_i))\,,\qquad p(x) = \frac{\exp(\alpha+\beta x)}{1+\exp(\alpha+\beta x)}\,.
The likelihood is
L(α,βy)i=1n(exp(α+βxi)1+exp(α+βxi))yi(11+exp(α+βxi))1yiL(\alpha,\beta\mid \mathbf y) \propto \prod_{i=1}^n \Big(\frac{\exp(\alpha+\beta x_i)}{1+\exp(\alpha+\beta x_i)}\Big)^{y_i}\Big(\frac{1}{1+\exp(\alpha+\beta x_i)}\Big)^{1-y_i}
and let
π(eα)Exp(1/b)\pi(e^\alpha)\sim \mathrm{Exp}(1/b)
and put a flat prior on
β\beta
, i.e.,
πα(αb)πβ(b)=1beeα/bdeαdβ=1beαeeα/bdαdβ.\pi_\alpha(\alpha\mid b)\pi_\beta(b) = \frac 1b e^{-e^\alpha/b}de^\alpha d\beta=\frac 1b e^\alpha e^{-e^\alpha/b}d\alpha d\beta\,.
Note that
E[α]=αbeαeeα/bdα=0logw1bew/b=logbγ,\begin{aligned} \mathrm{E}[\alpha] &= \int_{-\infty}^\infty \frac{\alpha}{b}e^\alpha e^{-e^\alpha/b}d\alpha\\ &=\int_0^\infty \log w\frac 1b e^{-w/b} \\ &=\log b -\gamma\,, \end{aligned}
where
γ=0exlogxdx\gamma = -\int_0^\infty e^{-x}\log xdx
Choose the data-dependent value that makes
Eα=α^\mathrm{E}\alpha=\hat\alpha
, where
α^\hat \alpha
is the MLE of
α\alpha
, so
b^=exp(α^+γ)\hat b=\exp(\hat \alpha+\gamma)
.
We can use MLE to estimate the coefficient in the logistical model (see my post for the derivation), and the following Julia code can help us fit the model quickly.
1
using DataFrames, GLM, Plots
2
3
temp = [53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81]
4
failure = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0]
5
data = DataFrame(temp = temp, failure = failure)
6
logit_fit = glm(@formula(failure ~ temp), data, Binomial(), LogitLink())
7
plot(temp, predict(logit_fit), legend = false, xlabel = "Temperature", ylab = "Probability")
8
scatter!(temp, predict(logit_fit))
Copied!
The estimates of the parameters are
α^=15.0479,β^=0.232163\hat\alpha=15.0479, \hat\beta=-0.232163
and
σ^β=0.108137\hat\sigma_\beta = 0.108137
.
Wrote the following Julia code to implement the independent MH algorithm,
1
## metropolis-hastings
2
using Distributions
3
γ = 0.57721
4
function ll(α::Float64, β::Float64)
5
a = exp.(α .+ β*temp)
6
return prod( (a ./ (1 .+ a) ).^failure .* (1 ./ (1 .+ a)).^(1 .- failure) )
7
end
8
function mh_logit(T::Int, α_hat::Float64, β_hat::Float64, σ_hat::Float64)
9
φ = Normal(β_hat, σ_hat)
10
π = Exponential(exp(α_hat+γ))
11
Α = ones(T)
12
Β = ones(T)
13
for t = 1:T-1
14
α = log(rand(π))
15
β = rand(φ)
16
r = ( ll(α, β) / ll(Α[t], Β[t]) ) * ( pdf(φ, β) / pdf(φ, Β[t]) )
17
if rand() < r
18
Α[t+1] = α
19
Β[t+1] = β
20
else
21
Α[t+1] = Α[t]
22
Β[t+1] = Β[t]
23
end
24
end
25
return Α, Β
26
end
27
28
Α, Β = mh_logit(10000, 15.04, -0.233, 0.108)
29
30
p1 = plot(Α, legend = false, xlab = "Intercept")
31
hline!([15.04])
32
33
p2 = plot(Β, legend = false, xlab = "Slope")
34
hline!([-0.233])
35
36
plot(p1, p2, layout = (1,2))
Copied!

Saddlepoint tail area approximation

If
K(τ)=log(Eexp(τX))K(\tau) = \log({\mathbb E}\exp(\tau X))
is the cumulant generating function, solving the saddlepoint equation
K(τ)=xK'(\tau)=x
yields the saddlepoint. For noncentral chi squared random variable
XX
, the moment generating function is
ϕX(t)=exp(2λt/(12t))(12t)p/2,\phi_X(t) = \frac{\exp(2\lambda t/(1-2t))}{(1-2t)^{p/2}}\,,
where
pp
is the number of degrees of freedom and
λ\lambda
is the noncentrality parameter, and its saddlepoint is
τ^(x)=p+2xp2+8λx4x.\hat\tau(x) = \frac{-p+2x-\sqrt{p^2+8\lambda x}}{4x}\,.
The saddlepoint can be used to approximate the tail area of a distribution. We have the approximation
P(Xˉ>a)=a(n2πKX(τ^(x)))1/2exp{n[KX(τ^(x))τ^(x)x]}dx=τ^(a)1/2(n2π)1/2[KX(t)]1/2exp{n[KX(t)tKX(t)]}dt1mi=1mI[Zi>τ^(a)],\begin{aligned} P(\bar X>a) &= \int_a^\infty \Big(\frac{n}{2\pi K_X''(\hat\tau(x))}\Big)^{1/2}\exp\{n[K_X(\hat\tau(x))-\hat\tau(x)x]\}dx\\ &= \int_{\hat\tau(a)}^{1/2} \Big(\frac{n}{2\pi}\Big)^{1/2}[K_X''(t)]^{1/2}\exp\{n[K_X(t)-tK_X'(t)]\}dt\\ &\approx \frac 1m\sum_{i=1}^m\mathbb{I}[Z_i>\hat \tau(a)]\,, \end{aligned}
where
ZiZ_i
is the sample from the saddlepoint distribution. Using a Taylor series approximation,
exp{n[KX(t)tKX(t)]}exp{nKX(0)t22},\exp\{n[K_X(t)-tK_X'(t)]\}\approx \exp\Big\{-nK''_X(0)\frac{t^2}{2}\Big\}\,,
so the instrumental density can be chosen as
N(0,1nKX(0))N(0,\frac{1}{nK_X''(0)})
, where
KX(t)=2[p(12t)+4λ](12t)3.K_X''(t)=\frac{2[p(1-2t) + 4\lambda]}{(1-2t)^3}\,.
I implemented the independent MH algorithm via the following code to produce random variables from the saddlepoint distribution.
1
using Distributions
2
3
p = 6
4
λ = 9
5
6
function K(t)
7
return 2λ*t / (1-2t) - p / 2 * log(1-2t)
8
end
9
10
function D1K(t)
11
return ( 4λ*t ) / (1-2t)^2 + ( 2λ + p ) / ( 1 - 2t )
12
end
13
14
function D2K(t)
15
return 2(p*(1-2t) + 4λ) / (1-2t)^3
16
end
17
18
function logf(n, t)
19
return n * (K(t) - t*D1K(t)) + 0.5log(D2K(t))
20
end
21
22
function logg(n, t)
23
return -1n * D2K(0) * t^2/2
24
end
25
26
function mh_saddle(T::Int = 10000; n::Int = 1)
27
Z = zeros(T)
28
g = Normal(0, 1 / sqrt(n*D2K(0)))
29
for t = 1:T-1
30
z = rand(g)
31
logr = logf(n, z) - logf(n, Z[t]) + logg(n, Z[t]) - logg(n, z)
32
if log(rand()) < logr
33
Z[t+1] = z
34
else
35
Z[t+1] = Z[t]
36
end
37
end
38
return Z
39
end
40
41
Z = mh_saddle(n = 1)
42
43
function tau(x)
44
return ( -1p + 2x - sqrt(p^2 + 8λ * x) ) / (4x)
45
end
46
47
println(sum(Z .> tau(36.225)) / 10000)
48
println(sum(Z .> tau(40.542)) / 10000)
49
println(sum(Z .> tau(49.333)) / 10000)
Copied!
I can successfully reproduce the results.
If
n=1n=1
,
Interval
1 - pchisq(x, 6, 9*2)
IMH
(36.225,)(36.225,\infty)
0.1000076
0.1037
(40.542,)(40.542,\infty)
0.05000061
0.0516
(49.333,)(49.333,\infty)
0.01000057
0.0114
For
n=100n=100
, note that if
Xiχp2(λ)X_i\sim \chi^2_p(\lambda)
, then
nXˉ=Xiχnp2(nλ)n\bar X=\sum X_i\sim \chi^2_{np}(n\lambda)
, then we can use the following R code to figure out the cutpoints.
1
qchisq(0.90, 600, 1800)/100
2
qchisq(0.95, 600, 1800)/100
3
qchisq(0.99, 600, 1800)/100
Copied!
Then use these cutpoints to estimate the probability by using samples produced from independent MH algorithm.
Interval
1 - pchisq(x*100, 6*100, 9*2*100)
IMH
(25.18054,)(25.18054,\infty)
0.10
0.1045
(25.52361,)(25.52361,\infty)
0.05
0.0516
(26.17395,)(26.17395,\infty)
0.01
0.0093
Last modified 2yr ago