The Metropolis Monte Carlo Method

Monte Carlo is a type of numerical integration. Consider the following integral:

$\displaystyle I = \int_a^b f\left(x\right) dx$ (62)

Now imagine that we have a second function, $ \rho(x)$, which is positive in the interval $ [a,b]$. We can also express $ I$ as

$\displaystyle I = \int_a^b \frac{f\left(x\right)}{\rho\left(x\right)}\rho\left(x\right) dx$ (63)

If we think of $ \rho\left(x\right)$ as a probability density, then what we have just expressed is the expectation value of the quantity $ f/\rho$ on $ \rho $ in the interval $ [a,b]$:

$\displaystyle I = \left<\frac{f\left(x\right)}{\rho\left(x\right)}\right>_\rho$ (64)

This implies that we can approximate $ I$ by picking $ M$ values $ \left\{x_i\right\}$ randomly out of the probability distribution $ \rho(x)$ and computing the following sum:

$\displaystyle I \approx \frac{1}{M} \sum_{i=1}^{M} \frac{f\left(x_i\right)}{\rho\left(x_i\right)}$ (65)

Note that this approximates the mean of $ f/\rho$ as long as we pick a large enough number of random numbers ($ M$ is large enough) such that we “densely” cover the interval $ [a,b]$. If $ \rho\left(x\right)$ is uniform on $ [a,b]$,

$\displaystyle \rho\left(x\right) = \frac{1}{b-a},   a < x < b$ (66)

and therefore,

$\displaystyle I \approx \frac{a-b}{M} \sum_{i=1}^{M} f\left(x_i\right)$ (67)

The next question is, how good an approximation is this, compared with other one-dimensional numerical integration techniques, such as Simpson's rule and quadrature? A better phrasing of this question is, how expensive is this technique for a given level of accuracy, compared to traditional techniques? Consider this means to compute $ \pi $:

$\displaystyle I = \int_0^1 \left(1-x^2\right)^{1/2}dx = \frac{\pi}{4}$ (68)

Allen and Tildesley [2] mention that, in order use Eq. 67 to compute $ \pi $ to an accuracy of one part in 10$ ^6$ requires $ M$ = 10$ ^7$ random values of $ x_i$, whereas Simpson's rule requires three orders of magnitude fewer points to discretize the interval to obtain an accuracy of one part in 10$ ^7$ (Fig. 2). So the answer is, integral estimation using Monte Carlo estimation with uniform random variates is expensive.

Figure 2: Integration of Eq. 68 by Monte Carlo and Simpson's rule. Upper: 4$ I$ vs number of random points for MC, for which points are uniform random variates on [0,1], and for Simpson's rule, for which points are equally spaced along [0,1]. Lower: Squared error in the estimate of $ \pi $ for each method.
Image pi_by_mc

But, the situation changes radically when the dimensionality of the integral is large, as is the case for an ensemble average. For example, for a system of 100 particles comprising 300 coordinates, the configurational average $ \left<G\right>$ (Eq. 59) could be discretized using Simpson's rule. If we did that, requesting only a modest 10 discrete points per axis in configurational space, we would need to evaluate the integrand $ Ge^{-\beta{U}}$ 10$ ^{300}$ times. This is an almost unimaginably large number. Using a direct numerical technique to compute statistical mechanical averages is simply out of the question.

We therefore return to the idea of evaluating the integrand at a discrete set of points selected randomly from a distribution. Here we call upon the idea of importance sampling. Let us try to use whatever we know ahead of time about the integrand in picking our random distribution, $ \rho $, such that we minimize the number of points (i.e., the expense) necessary to give an estimate of $ \left<G\right>$ to a given level of accuracy.

Now, clearly the states that contribute the most to the integrals we wish to evaluate by configurational averaging are those states with large Boltzmann factors; that is, those states for which $ \rho_{NVT}$ is large. It stands to reason that if we randomly select points from $ \rho_{NVT}$, we will do a pretty good job approximating the integral. So what we end up computing is the “average of $ G\rho_{NVT}$ over $ \rho_{NVT}$”:

$\displaystyle \left<G\rho_{NVT}/\rho_{NVT}\right> \approx \left<G\right>,$ (69)

which should give an excellent approximation for $ \left<G\right>$. The idea of using $ \rho_{NVT}$ as the sampling distribution is due to Metropolis et al. [3]. This makes the real work in computing $ \left<G\right>$ generating states that randomly sample $ \rho_{NVT}$.

Metropolis et al. [3] showed that an efficient way to do this involves generating a Markov chain of states which is constructed such that its limiting distribution is $ \rho_{NVT}$. A Markov chain is just a sequence of trials, where (i) each trial outcome is a member of a finite set and (ii) every trial outcome depends only on the outcome that immediately precedes it. By “limiting distribution,” we mean that the trial acceptance probabilities are tuned such that the probability of observing the Markov chain atop a particular state is defined by some equilibrium probability distribution, $ \rho $. For the following discussion, it will be convenient to denote a particular state $ n$ using $ \Gamma_n$, instead of $ \nu$.

A trial is some perturbation (usually small) of the coordinates specifying a state. For example, in an Ising system, this might mean flipping a randomly selected spin. In a system of particles in continuous space, it might mean displacing a randomly selected particle by a small amount $ \delta r$ in a randomly chosen direction $ (\theta,\phi)$. There can be a large variety of such “trial moves” for any particular system.

The probability that a trial move results in a successful transition from state $ n$ to $ m$ is denoted $ \pi_{nm}$ and $ {\bf\pi}$ is called the “transition matrix.” It must be specified ahead of time to execute a traditional Markov chain. Since the probability that a trial results in a successful transition to any state, the rows of $ {\bf\pi}$ add to unity:

$\displaystyle \sum_i \pi_{ni} = 1$ (70)

With this specification, we term $ {\bf\pi}$ a “stochastic” matrix. Furthermore, for an equilibrium ensemble of states in state space, we require that transitions from state to state do not alter state weights as determined by the limiting distribution. So the weight of state $ n$:

$\displaystyle \rho_n \equiv \rho_{NVT}\left(\Gamma_n\right)$ (71)

must be the result of transitions from all other states to state $ n$:

$\displaystyle \rho_n = \sum_m \rho_m \pi_{mn}.$ (72)

For all states $ n$, we can write Eq. 72 as a post-op matrix equation:

$\displaystyle {\bf\rho\pi} = {\bf\rho}$ (73)

where $ {\bf\rho}$ is the row vector of all state weights. Eq. 73 constrains our choice of $ {\bf\pi}$. This means there is still more than one way to specify $ {\bf\rho}$. Metropolis et al. [3] suggested:

$\displaystyle \rho_m\pi_{mn} = \rho_n\pi_{nm}$ (74)

That is, the probability of transitioning from state $ m$ to $ n$ is exactly equal to the probability of transitioning from state $ n$ to $ m$. This is called the “detailed balance” condition, and it guarantees that the state weights remain static. Observe:

$\displaystyle \sum_m \rho_m\pi_{mn} = \sum_m\left(\rho_n\pi_{nm}\right) = \rho_n\left(\sum_m\pi_{nm}\right) = \rho_n$ (75)

Detailed balance is, however, overly restrictive; is the only the conceptually simplest way to guarantee that a limiting distribution is obtained. This fact is of little importance in this course, but you may encounter other balance-enforcing conditions in the literature.

Metropolis et al. [3] chose to construct $ {\bf\pi}$ as

$\displaystyle \pi_{nm} = \alpha_{nm} {\rm acc}\left(n\rightarrow m\right)$ (76)

where $ \alpha$ is the probability that a trial move is attempted, and $ {\rm acc}$ is the probability that a move is accepted. If the probability of proposing a move from $ n$ to $ m$ is equal to that of proposing a move from $ m$ to $ n$, then $ \alpha_{nm} = \alpha_{mn}$, and the detailed balance condition is written:

$\displaystyle \rho_n{\rm acc}\left(n\rightarrow m\right) = \rho_m{\rm acc}\left(m\rightarrow n\right)$ (77)

from which follows

$\displaystyle \frac{{\rm acc}\left(n\rightarrow m\right)}{{\rm acc}\left(m\righ...
...= \frac{e^{-\beta{U}\left(\Gamma_m\right)}}{e^{-\beta{U}\left(\Gamma_n\right)}}$ (78)

giving

$\displaystyle \frac{{\rm acc}\left(n\rightarrow m\right)}{{\rm acc}\left(m\righ...
...eft(\Gamma_n\right)\right]\right\} \equiv \exp\left(-\beta\Delta{U}_{nm}\right)$ (79)

where we have defined the change in potential energy as

$\displaystyle \Delta{U}_{nm} = {U}\left(\Gamma_m\right)-{U}\left(\Gamma_n\right)$ (80)

There are many choices for $ {\rm acc}\left(n \rightarrow m\right)$ that satisfy Eq. 79. The original choice of Metropolis is used most frequently:

$\displaystyle {\rm acc}\left(n \rightarrow m\right) = \left\{\begin{array}{ll} ...
...}_{nm}\right) & \Delta{U}_{nm} > 0 1 & \Delta{U}_{nm} < 0 \end{array} \right.$ (81)

So, suppose we have some initial configuration $ n$ with potential energy $ U_n$. We make a trial move, temporarily generating a new configuration $ m$. Now we calculate a new energy, $ U_m$. If this energy is lower than the original, ($ U_m < U_n$) we unconditionally accept the move, and configuration $ m$ becomes the current configuration. If it is greater than the original, ($ U_m > U_n$) we accept it with a probability consistent with the fact that the states both belong to a canonical ensemble. How does one in practice decide whether to accept the move? One first picks a uniform random variate $ x$ on the interval $ [0,1]$. If $ x \le {\rm acc}\left(n \rightarrow m\right)$, the move is accepted.

The next section is devoted to an implementation of the Metropolis Monte Carlo method for a 2D Ising magnet.

cfa22@drexel.edu