# Markov Chain

## Essential properties of Markov Chain

In the setup of MCMC algorithms, we construct Markov chains from a transition kernel $$K$$, a conditional probability density such that $$X\_{n+1}\sim K(X\_n,X\_{n+1})$$.

The chain encountered in MCMC settings enjoy a very strong stability property, namely a **stationary probability distribution**; that is, a distribution $$\pi$$ such that if $$X\_n\sim\pi$$, then $$X\_{n+1}\sim \pi$$, if the kernel $$K$$ allows for free moves all over the state space. This freedom is called **irreducibility** in the theory of Markov chains and is formalized as the existence of $$n\in\mathbb{N}$$ such that $$P(X\_n\in A\mid X\_0)>0$$ for every $$A$$ such that $$\pi(A)>0$$. This property also ensures that most of the chains involved in MCMC algorithms are **recurrent** (that is, that the average number of visits to an arbitrary set $$A$$ is infinite), or even **Harris recurrent** (that is, such that the probability of an infinite number of returns to $$A$$ is 1).

**Harris recurrence** ensures that the chain has the same limiting behavior for every starting value instead of almost every starting value.

The stationary distribution is also a limiting distribution in the sense that the limiting distribution of $$X\_{n+1}$$ is $$\pi$$ under the total variation norm, notwithstanding the initial value of $$X\_0$$.

Strong forms of convergence are also encountered in MCMC settings, like geometric and uniform convergences.

If the marginals are proper, for convergence we only need our chain to be aperiodic. A sufficient condition is that $$K(x\_n,\cdot)>0$$ (or, equivalently, $$f(\cdot\mid x\_n)>0$$) in a neighborhood of $$x\_n$$.

If the marginal are not proper, or if they do not exist, then the chain is not positive recurrent. It is either null recurrent, and both cases are bad.

The [detailed balance condition](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/detailed_balance_condition.png) is **not necessary** for $$f$$ to be a stationary measure associated with the transition kernel $$K$$, but it provides a sufficient condition that is often easy to check and that can be used for most MCMC algorithms.

Ergodicity: independence of initial conditions

* [geometrically h-ergodic](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/def_geometrical_h_ergodic.png): decreasing at least at a geometric speed.
* [uniform ergodicity](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/def_uniform_ergodic.png): stronger than geometric ergodicity in the sense that the rate of geometric convergence must be uniform over the whole space.

Irreducibility + Aperiodic = Ergodicity ?

## Basic Notions

![](https://666993855-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LJfsESZOIJn_3uGIecs%2F-LX1zvsbfNzd-TLxEdn_%2F-LX1zwX_SlG1OoZ8kk5t%2Fdef6.2.png?generation=1548386029183263\&alt=media)

![](https://666993855-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LJfsESZOIJn_3uGIecs%2F-LX1zvsbfNzd-TLxEdn_%2F-LX1zwXdq7UQeIbmqdl1%2Fdef6.4.png?generation=1548386028801518\&alt=media)

## Bernoulli-Laplace Model

![](https://666993855-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LJfsESZOIJn_3uGIecs%2F-LX1zvsbfNzd-TLxEdn_%2F-LX1zwXbQGKRNn-5z67E%2Fex6.3.png?generation=1548386028705179\&alt=media)

The finite chain is indeed irreducible since it is possible to connect the status $$x$$ and $$y$$ in $$\vert x-y\vert$$ steps with probability

$$
\prod\_{i=x\land y}^{x\vee y-1}\Big(\frac{M-i}{M}\Big)^2,.
$$

The Bernoulli-Laplace chain is [aperiodic](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/def_period_discrete.png) and even [strongly aperiodic](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/def_period.png) since the diagonal terms satisfy $$P\_{xx}>0$$ for every $$x\in {0,\ldots,K}$$.

Given the quasi-diagonal shape of the transition matrix, it is possible to directly determine the invariant distribution, $$\pi=(\pi\_0,\ldots,\pi\_K)$$. From the equation $$\pi P = \pi$$,

$$
\begin{aligned}
\pi\_0 &= P\_{00}\pi\_0 + P\_{10}\pi\_1\\
\pi\_1 &= P\_{01}\pi\_1 + P\_{11}\pi\_1 + P\_{21}\pi\_2\\
\cdots &=\cdots\\
\pi\_K &= P\_{(K-1)K}\pi\_{K-1} + P\_{KK}\pi\_K,.
\end{aligned}
$$

Thus,

$$
\pi\_k=\binom{K}{k}^2\pi\_0,,\qquad k=0,\ldots,K,,
$$

and through normalization,

$$
\pi\_k=\frac{\binom{K}{k}^2}{\binom{2K}{K}},,
$$

by using [Chu-Vandermonde identity](https://en.wikipedia.org/wiki/Vandermonde's_identity)

$$
\binom{m+n}{r}=\sum\_{k=0}^r\binom{m}{k}\binom{n}{r-k}
$$

with $$m=n=r=K$$. It turns out that the hypergeometric distribution $$H(2K,K,1/2)$$ is the [invariant distribution](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/def_invariant.png) for the Bernoulli-Laplace model.

## AR(1) Models

A simple illustration of Markov chains on continuous state-space.

$$
X\_n = \theta X\_{n-1}+\varepsilon\_n;,\theta\in \mathrm{I!R},,
$$

with $$\varepsilon\_n\in N(0,\sigma^2)$$, and if the $$\varepsilon\_n$$'s are independent, $$X\_n$$ is indeed independent from $$X\_{n-2},X\_{n-3},\ldots$$ conditionally on $$X\_{n-1}$$.

* The Markovian properties of an AR(q) can be derived from $$(X\_n,\ldots,X\_{n-q+1})$$.
* ARMA(p, q) doesn't fit in the Markovian framework.

Since $$X\_n\mid x\_{n-1}\sim N(\theta x\_{n-1},\sigma^2)$$, consider the lower bound of the transition kernel ($$\theta > 0$$):

$$
\begin{aligned}
K(x\_{n-1},x\_n) &= \frac{1}{\sqrt{2\pi}}\exp\Big{-\frac{1}{2\sigma^2}(x\_n-\theta x\_{n-1})^2\Big}\\
&\ge \frac{1}{\sqrt{2\pi}\sigma}\exp\Big{-\frac{1}{2\sigma^2}\max{(x\_n-\theta \underline w)^2, (x\_n-\theta \bar w)^2}\Big}\\
&\ge \frac{1}{\sqrt{2\pi}\sigma}\exp\Big{-\frac{1}{2\sigma^2}\Big\[ \max{-2\theta x\_n\underline w,-2\theta x\_n\bar w}+x\_n^2 + \theta^2\underline w^2\land \bar w^2 \Big]\Big}\\
&=\begin{cases}
\frac{1}{\sqrt{2\pi}\sigma}\exp\Big{-\frac{1}{2\sigma^2}\Big\[ -2\theta x\_n\underline w+x\_n^2 + \theta^2\underline w^2\land \bar w^2 \Big]\Big}& \text{if }x\_n>0\\
\frac{1}{\sqrt{2\pi}\sigma}\exp\Big{-\frac{1}{2\sigma^2}\Big\[ -2\theta x\_n\bar w+x\_n^2 + \theta^2\underline w^2\land \bar w^2 \Big]\Big}& \text{if }x\_n<0
\end{cases},,
\end{aligned}
$$

when $$x\_{n-1}\in\[\underline w, \bar w]$$. The set $$C = \[\underline w, \bar w]$$ is a [small set](https://github.com/szcf-weiya/MonteCarlo/tree/a8a19b34ad8646f12cf5d7a4e4bfa9919c392451/MarkovChain/def_small.png), as the measure $$\nu\_1$$ with density

$$
\frac{\exp{(-x^2+2\theta x\underline w)/2\sigma^2}I\_{x>0} + \exp{(-x^2+2\theta x\bar w)/2\sigma^2}I\_{x<0} }{\sqrt{2\pi}\sigma{\[1-\Phi(-\theta\underline w/\sigma^2)]\exp(\theta^2\underline w^2/2\sigma^2)+\Phi(-\theta\bar w/\sigma)\exp(\theta^2\bar w^2/2\sigma^2)}},,
$$

satisfy

$$
K^1(x,A)\ge \nu\_1(A),\qquad \forall x\in C, \forall A\in {\cal B(X)},.
$$

Given that the transition kernel corresponds to the $$N(\theta x\_{n-1},\sigma^2)$$ distribution, a normal distribution $$N(\mu,\tau^2)$$ is stationary for the AR(1) chain only if

$$
\mu=\theta\mu\qquad\text{and}\qquad \tau^2=\tau^2\theta^2+\sigma^2,.
$$

These conditions imply that $$\mu=0$$ and that $$\tau^2=\sigma^2/(1-\theta^2)$$, which can only occur for $$\vert \theta\vert < 1$$. In this case, $$N(0,\sigma^2/(1-\theta^2))$$ is indeed the unique stationary distribution of the AR(1) chain.

## Branching process

![](https://666993855-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LJfsESZOIJn_3uGIecs%2F-LX4VHQqZYolr1QkRig7%2F-LX4VIo9tpwvAyiNqljv%2Fex_branching_process.png?generation=1548428067385766\&alt=media)

* If $$\phi$$ doesn't have a constant term, i.e., $$P(X\_1=0)=0$$, then chain $$S\_t$$ is necessarily transient since it is increasing.
* If $$P(X\_1=0)>0$$, the probability of a return to 0 at time $$t$$ is $$\rho(t)=P(S\_t=0)=g\_t(0)$$, which thus satisfies the recurrence equation $$\rho\_t=\phi(\rho\_{t-1})$$. There exists a limit $$\rho$$ different from 1, such that $$\rho=\phi(\rho)$$, iff $$\phi'(1)>1$$; namely if $$E\[X]>1$$. The chain is thus transient when the average number of siblings per individual is larger than 1. If there exists a restarting mechanism in 0, $$S\_{t+1}\mid S\_t=0\sim\phi$$, it is easily shown that when $$\phi'(1)>1$$, the number of returns to 0 follows a geometric distribution with parameter $$\rho$$.

![](https://666993855-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LJfsESZOIJn_3uGIecs%2F-LX4VHQqZYolr1QkRig7%2F-LX4VIoBfrKP5hf5FwYM%2Fi1_branch_process.png?generation=1548428067557566\&alt=media)

* If $$\phi'(1)\le 1$$, the chain is recurrent.
