Biology Magazine

Sampling from a Probability Distribution

Posted on the 21 June 2015 by Ccc1685 @ccc1685

When simulating any system with randomness, sampling from a probability distribution is necessary. Usually, you’ll just need to sample from a normal or uniform distribution and thus can use a built-in random number generator. However, for the time when a built-in function does not exist for your distribution, here’s a simple algorithm.

Let’s say you have some probability density function (PDF)

\rho(x)
on some domain
[x_{\min},x_{\max}]
and you want to generate a set of numbers that follows this probability law. This is actually very simple to do although those new to the field may not know how. Generally, any programming language or environment has a built-in random number generator for real numbers (up to machine precision) between 0 and 1. Thus, what you want to do is to find a way to transform these random numbers, which are uniformly distributed on the domain
[0,1]
to the random numbers you want. The first thing to notice is that the cumulative distribution function (CDF) for your PDF,
P(x) = \int_{x_{\min}}^x \rho(x') dx'
is a function that ranges over the interval
[0,1]
, since it is a probability. Thus, the algorithm is: 1) generate a random number
r
between o and 1, and 2) plug it into the inverse of
P(x)
. The numbers you get out satisfy your distribution (i.e.
P(x)=r \Rightarrow x = P^{-1}(r)
.

If you’re lucky enough to have a closed form expression of the inverse of the CDF

P^{-1}(r)
then you can just plug the random numbers directly into your function, otherwise you have to do a little more work. If you don’t have a closed form expression for the CDF then you can just solve the equation
P(y)-r=0
for
x
. Newton’s method will generally converge very quickly.  You just loop over the command x = x – (P(x)-r)/\rho(x) until you reach the tolerance you want. If you don’t have a closed form expression of the CDF then the simplest way is construct the CDF numerically by computing the N dimensional vector 
v[j]=\sum_{i=0}^j \rho(jh-x_{\min}) h
, for some discretization parameter
h
such that
x_{\max}= Nh
 .  If the PDF is defined on the entire real line then you’ll have to decide on what you want the minimum
x_{\min}
. If the distribution has thin tails, then you just need to go out until the probability is smaller than your desired error tolerance. You then find the index
j
such that
v[j]
is closest to
r
(i.e. in Julia you could use findmin(abs(v[j]-r))) and your random number is
x = hj
.

Here is why the algorithm works. What we are doing is transforming the PDF of

r
, which we can call
u(r)
, to the desired PDF for
x
. The transformation we use is
r = P(x)
. Thus,
u(r) dr = \rho(x) dx
or
\rho(x) = u(r) dr/dx
. Now,
u(r)
is just 1 on the interval
[0,1]
and
dr/dx
is the PDF of
x
as expected. Thus, inverting the CDF for uniform random numbers always gives you a new set of random numbers that satisfies the new PDF. Trivial perhaps, but really useful.


Back to Featured Articles on Logo Paperblog