There's a very famous theorem in statistics (the central limit theorem) which says roughly that if you add a whole bunch of variables together which are identically distributed, the resulting sum will have a Gaussian distribution.

Also, an important condition is that the variables must be independent (in short i.i.d.).

There are many variants of the Central Limit Theorem. One interesting formulation is the following (from the link you gave on wikipedia):

The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist). Thus the central limit theorem can be interpreted as a statement about the properties of density functions under convolution: the convolution of a number of density functions tends to the normal density as the number of density functions increases without bound, under the conditions stated above.

Since the characteristic function of a convolution is the product of the characteristic functions of the densities involved, the central limit theorem has yet another restatement: the product of the characteristic functions of a number of density functions tends to the characteristic function of the normal density as the number of density functions increases without bound, under the conditions stated above.


It's not easy to formulate the oil production problem in a strictly probabilistic framework. Curve fitting used here is a parameteric regression approach. An alternative approach is the nonparametric density estimation (or regression). It consists in estimating an unknown density function from a sum of kernel functions:

where h is the smoothing parameter and K(x) is the symmetric kernel function which must satisfy the following properties:

This formulation is attractive because K(x) can be interpreted as an elementary field production curve. Furthermore, you don't need to make assumpations about the shape of the curve (gaussian, logistic, etc.). For more info, here a quick introduction. I tried once a few simulations by adding elementary curves spawn by a prior model which was supposed to model the discovery pattern:

Sorry, the second link is not good, use this one:
A Statistical Model for the Simulation of Oil Production
The convolution point is a good one - I vaguely remember that from undergraduate functional analysis now you mention it. WebHubbleTelescope has been doing some interesting modeling where you take the discovery curve and convolute it to get the production curve, but as far as I can tell he more or less handcrafts the convolution function to make the past history fit. It's not clear here why there'd be enough layers of convolution to produce such good agreement with the Gaussian across several orders of magnitude. OTOH, it seems like there must be some central limit theorem type reasoning here. It would solve a problem in my mind - I would expect the logistic to be a rough approximation to oil production, but the degree of fit with the US production is surprising, and I can't think of any good reason why it should work so precisely. If there's really a central limit story for why the US production is Gaussian, then it's just down to the fact that the logistic derivative and Gaussian are pretty similar shapes.