Monte-Carlo
  • Introduction
  • AIS
  • Generate R.V.
    • Special Distribution
    • Copulas
    • Minimum of Two Exponential
  • Gibbs
    • Comparing two groups
    • Linear Regression
    • Simulation of Exp-Abs-xy
  • Markov Chain
  • MC Approximation
  • MC Integration
    • Rao-Blackwellization
  • MC Optimization
  • MCMC
    • MCMC diagnostics
  • Metropolis-Hastings
    • Metropolis
    • Independent MH
    • Random Walk MH
    • ARMS MH
  • PBMCMC
  • RJMCMC
  • Diagnosing Convergence
  • SMCTC
  • Tempering
    • Parallel Tempering
  • Misc
    • R vs. Julia
  • References
Powered by GitBook
On this page
  • Essential properties of Markov Chain
  • Basic Notions
  • Bernoulli-Laplace Model
  • AR(1) Models
  • Branching process

Markov Chain

PreviousSimulation of Exp-Abs-xyNextMC Approximation

Last updated 6 years ago

Essential properties of Markov Chain

In the setup of MCMC algorithms, we construct Markov chains from a transition kernel KKK, a conditional probability density such that Xn+1∼K(Xn,Xn+1)X_{n+1}\sim K(X_n,X_{n+1})Xn+1​∼K(Xn​,Xn+1​).

The chain encountered in MCMC settings enjoy a very strong stability property, namely a stationary probability distribution; that is, a distribution π\piπ such that if Xn∼πX_n\sim\piXn​∼π, then Xn+1∼πX_{n+1}\sim \piXn+1​∼π, if the kernel KKK allows for free moves all over the state space. This freedom is called irreducibility in the theory of Markov chains and is formalized as the existence of n∈Nn\in\mathbb{N}n∈N such that P(Xn∈A∣X0)>0P(X_n\in A\mid X_0)>0P(Xn​∈A∣X0​)>0 for every AAA such that π(A)>0\pi(A)>0π(A)>0. This property also ensures that most of the chains involved in MCMC algorithms are recurrent (that is, that the average number of visits to an arbitrary set AAA is infinite), or even Harris recurrent (that is, such that the probability of an infinite number of returns to AAA is 1).

Harris recurrence ensures that the chain has the same limiting behavior for every starting value instead of almost every starting value.

The stationary distribution is also a limiting distribution in the sense that the limiting distribution of Xn+1X_{n+1}Xn+1​ is π\piπ under the total variation norm, notwithstanding the initial value of X0X_0X0​.

Strong forms of convergence are also encountered in MCMC settings, like geometric and uniform convergences.

If the marginals are proper, for convergence we only need our chain to be aperiodic. A sufficient condition is that K(xn,⋅)>0K(x_n,\cdot)>0K(xn​,⋅)>0 (or, equivalently, f(⋅∣xn)>0f(\cdot\mid x_n)>0f(⋅∣xn​)>0) in a neighborhood of xnx_nxn​.

If the marginal are not proper, or if they do not exist, then the chain is not positive recurrent. It is either null recurrent, and both cases are bad.

The is not necessary for fff to be a stationary measure associated with the transition kernel KKK, but it provides a sufficient condition that is often easy to check and that can be used for most MCMC algorithms.

Ergodicity: independence of initial conditions

  • : decreasing at least at a geometric speed.

  • : stronger than geometric ergodicity in the sense that the rate of geometric convergence must be uniform over the whole space.

Irreducibility + Aperiodic = Ergodicity ?

Basic Notions

Bernoulli-Laplace Model

The finite chain is indeed irreducible since it is possible to connect the status xxx and yyy in ∣x−y∣\vert x-y\vert∣x−y∣ steps with probability

∏i=x∧yx∨y−1(M−iM)2 .\prod_{i=x\land y}^{x\vee y-1}\Big(\frac{M-i}{M}\Big)^2\,.i=x∧y∏x∨y−1​(MM−i​)2.

Given the quasi-diagonal shape of the transition matrix, it is possible to directly determine the invariant distribution, π=(π0,…,πK)\pi=(\pi_0,\ldots,\pi_K)π=(π0​,…,πK​). From the equation πP=π\pi P = \piπP=π,

π0=P00π0+P10π1π1=P01π1+P11π1+P21π2⋯=⋯πK=P(K−1)KπK−1+PKKπK .\begin{aligned} \pi_0 &= P_{00}\pi_0 + P_{10}\pi_1\\ \pi_1 &= P_{01}\pi_1 + P_{11}\pi_1 + P_{21}\pi_2\\ \cdots &=\cdots\\ \pi_K &= P_{(K-1)K}\pi_{K-1} + P_{KK}\pi_K\,. \end{aligned}π0​π1​⋯πK​​=P00​π0​+P10​π1​=P01​π1​+P11​π1​+P21​π2​=⋯=P(K−1)K​πK−1​+PKK​πK​.​

Thus,

πk=(Kk)2π0 ,k=0,…,K ,\pi_k=\binom{K}{k}^2\pi_0\,,\qquad k=0,\ldots,K\,,πk​=(kK​)2π0​,k=0,…,K,

and through normalization,

πk=(Kk)2(2KK) ,\pi_k=\frac{\binom{K}{k}^2}{\binom{2K}{K}}\,,πk​=(K2K​)(kK​)2​,
(m+nr)=∑k=0r(mk)(nr−k)\binom{m+n}{r}=\sum_{k=0}^r\binom{m}{k}\binom{n}{r-k}(rm+n​)=k=0∑r​(km​)(r−kn​)

AR(1) Models

A simple illustration of Markov chains on continuous state-space.

Xn=θXn−1+εn  ,θ∈I ⁣R ,X_n = \theta X_{n-1}+\varepsilon_n\;,\theta\in \mathrm{I\!R}\,,Xn​=θXn−1​+εn​,θ∈IR,

with εn∈N(0,σ2)\varepsilon_n\in N(0,\sigma^2)εn​∈N(0,σ2), and if the εn\varepsilon_nεn​'s are independent, XnX_nXn​ is indeed independent from Xn−2,Xn−3,…X_{n-2},X_{n-3},\ldotsXn−2​,Xn−3​,… conditionally on Xn−1X_{n-1}Xn−1​.

  • The Markovian properties of an AR(q) can be derived from (Xn,…,Xn−q+1)(X_n,\ldots,X_{n-q+1})(Xn​,…,Xn−q+1​).

  • ARMA(p, q) doesn't fit in the Markovian framework.

Since Xn∣xn−1∼N(θxn−1,σ2)X_n\mid x_{n-1}\sim N(\theta x_{n-1},\sigma^2)Xn​∣xn−1​∼N(θxn−1​,σ2), consider the lower bound of the transition kernel (θ>0\theta > 0θ>0):

K(xn−1,xn)=12πexp⁡{−12σ2(xn−θxn−1)2}≥12πσexp⁡{−12σ2max⁡{(xn−θw‾)2,(xn−θwˉ)2}}≥12πσexp⁡{−12σ2[max⁡{−2θxnw‾,−2θxnwˉ}+xn2+θ2w‾2∧wˉ2]}={12πσexp⁡{−12σ2[−2θxnw‾+xn2+θ2w‾2∧wˉ2]}if xn>012πσexp⁡{−12σ2[−2θxnwˉ+xn2+θ2w‾2∧wˉ2]}if xn<0 ,\begin{aligned} K(x_{n-1},x_n) &= \frac{1}{\sqrt{2\pi}}\exp\Big\{-\frac{1}{2\sigma^2}(x_n-\theta x_{n-1})^2\Big\}\\ &\ge \frac{1}{\sqrt{2\pi}\sigma}\exp\Big\{-\frac{1}{2\sigma^2}\max\{(x_n-\theta \underline w)^2, (x_n-\theta \bar w)^2\}\Big\}\\ &\ge \frac{1}{\sqrt{2\pi}\sigma}\exp\Big\{-\frac{1}{2\sigma^2}\Big[ \max\{-2\theta x_n\underline w,-2\theta x_n\bar w\}+x_n^2 + \theta^2\underline w^2\land \bar w^2 \Big]\Big\}\\ &=\begin{cases} \frac{1}{\sqrt{2\pi}\sigma}\exp\Big\{-\frac{1}{2\sigma^2}\Big[ -2\theta x_n\underline w+x_n^2 + \theta^2\underline w^2\land \bar w^2 \Big]\Big\}& \text{if }x_n>0\\ \frac{1}{\sqrt{2\pi}\sigma}\exp\Big\{-\frac{1}{2\sigma^2}\Big[ -2\theta x_n\bar w+x_n^2 + \theta^2\underline w^2\land \bar w^2 \Big]\Big\}& \text{if }x_n<0 \end{cases}\,, \end{aligned}K(xn−1​,xn​)​=2π​1​exp{−2σ21​(xn​−θxn−1​)2}≥2π​σ1​exp{−2σ21​max{(xn​−θw​)2,(xn​−θwˉ)2}}≥2π​σ1​exp{−2σ21​[max{−2θxn​w​,−2θxn​wˉ}+xn2​+θ2w​2∧wˉ2]}=⎩⎨⎧​2π​σ1​exp{−2σ21​[−2θxn​w​+xn2​+θ2w​2∧wˉ2]}2π​σ1​exp{−2σ21​[−2θxn​wˉ+xn2​+θ2w​2∧wˉ2]}​if xn​>0if xn​<0​,​
exp⁡{(−x2+2θxw‾)/2σ2}Ix>0+exp⁡{(−x2+2θxwˉ)/2σ2}Ix<02πσ{[1−Φ(−θw‾/σ2)]exp⁡(θ2w‾2/2σ2)+Φ(−θwˉ/σ)exp⁡(θ2wˉ2/2σ2)} ,\frac{\exp\{(-x^2+2\theta x\underline w)/2\sigma^2\}I_{x>0} + \exp\{(-x^2+2\theta x\bar w)/2\sigma^2\}I_{x<0} }{\sqrt{2\pi}\sigma\{[1-\Phi(-\theta\underline w/\sigma^2)]\exp(\theta^2\underline w^2/2\sigma^2)+\Phi(-\theta\bar w/\sigma)\exp(\theta^2\bar w^2/2\sigma^2)\}}\,,2π​σ{[1−Φ(−θw​/σ2)]exp(θ2w​2/2σ2)+Φ(−θwˉ/σ)exp(θ2wˉ2/2σ2)}exp{(−x2+2θxw​)/2σ2}Ix>0​+exp{(−x2+2θxwˉ)/2σ2}Ix<0​​,

satisfy

K1(x,A)≥ν1(A),∀x∈C,∀A∈B(X) .K^1(x,A)\ge \nu_1(A),\qquad \forall x\in C, \forall A\in {\cal B(X)}\,.K1(x,A)≥ν1​(A),∀x∈C,∀A∈B(X).

Given that the transition kernel corresponds to the N(θxn−1,σ2)N(\theta x_{n-1},\sigma^2)N(θxn−1​,σ2) distribution, a normal distribution N(μ,τ2)N(\mu,\tau^2)N(μ,τ2) is stationary for the AR(1) chain only if

μ=θμandτ2=τ2θ2+σ2 .\mu=\theta\mu\qquad\text{and}\qquad \tau^2=\tau^2\theta^2+\sigma^2\,.μ=θμandτ2=τ2θ2+σ2.

These conditions imply that μ=0\mu=0μ=0 and that τ2=σ2/(1−θ2)\tau^2=\sigma^2/(1-\theta^2)τ2=σ2/(1−θ2), which can only occur for ∣θ∣<1\vert \theta\vert < 1∣θ∣<1. In this case, N(0,σ2/(1−θ2))N(0,\sigma^2/(1-\theta^2))N(0,σ2/(1−θ2)) is indeed the unique stationary distribution of the AR(1) chain.

Branching process

  • If ϕ\phiϕ doesn't have a constant term, i.e., P(X1=0)=0P(X_1=0)=0P(X1​=0)=0, then chain StS_tSt​ is necessarily transient since it is increasing.

  • If P(X1=0)>0P(X_1=0)>0P(X1​=0)>0, the probability of a return to 0 at time ttt is ρ(t)=P(St=0)=gt(0)\rho(t)=P(S_t=0)=g_t(0)ρ(t)=P(St​=0)=gt​(0), which thus satisfies the recurrence equation ρt=ϕ(ρt−1)\rho_t=\phi(\rho_{t-1})ρt​=ϕ(ρt−1​). There exists a limit ρ\rhoρ different from 1, such that ρ=ϕ(ρ)\rho=\phi(\rho)ρ=ϕ(ρ), iff ϕ′(1)>1\phi'(1)>1ϕ′(1)>1; namely if E[X]>1E[X]>1E[X]>1. The chain is thus transient when the average number of siblings per individual is larger than 1. If there exists a restarting mechanism in 0, St+1∣St=0∼ϕS_{t+1}\mid S_t=0\sim\phiSt+1​∣St​=0∼ϕ, it is easily shown that when ϕ′(1)>1\phi'(1)>1ϕ′(1)>1, the number of returns to 0 follows a geometric distribution with parameter ρ\rhoρ.

  • If ϕ′(1)≤1\phi'(1)\le 1ϕ′(1)≤1, the chain is recurrent.

The Bernoulli-Laplace chain is and even since the diagonal terms satisfy Pxx>0P_{xx}>0Pxx​>0 for every x∈{0,…,K}x\in \{0,\ldots,K\}x∈{0,…,K}.

by using

with m=n=r=Km=n=r=Km=n=r=K. It turns out that the hypergeometric distribution H(2K,K,1/2)H(2K,K,1/2)H(2K,K,1/2) is the for the Bernoulli-Laplace model.

when xn−1∈[w‾,wˉ]x_{n-1}\in[\underline w, \bar w]xn−1​∈[w​,wˉ]. The set C=[w‾,wˉ]C = [\underline w, \bar w]C=[w​,wˉ] is a , as the measure ν1\nu_1ν1​ with density

aperiodic
strongly aperiodic
Chu-Vandermonde identity
invariant distribution
small set
detailed balance condition
geometrically h-ergodic
uniform ergodicity