Predicting US Production with Gaussians

EIA Field production of crude in the US, logistic (Hubbert) fit based only on 1930-1976 data, and Gaussian fit based on same time interval. Source: EIA for the data.

So, I was fooling around tonight, and made a long term graph of US production growth rates (year-on-year), which looks as follows. Because the data are so noisy, I fit a polynomial to it just to get a sense of the trend. The polynomial came out almost a straight line. I varied the degree (in the graph it's a polynomial of degree 6), but it always wanted to be more or less straight.

Year-on-year change in EIA Field production of crude in the US, with linear and sixth order polynomial fit. Source: EIA for the data.

That's not what the logistic would say - the logistic would call for an S shaped decline in the growth rate, starting at K (which is around 6%). Of course, we know the logistic is not that great at modeling the early production. Still, that straight line is really sticking out. Hmmm. Scratch head, write a few equations, turns out that the function that has a linearly decreasing growth rate is a Gaussian. I've vaguely heard of people using Gaussian's instead of logistics as models of the peak, but haven't played with it myself before tonight.

So, plot the log of production versus time and fit a parabola: Oh my.

Natural log of EIA Field production of crude in the US, with quadratic fit. Source: EIA for the data.

Pretty good fit across the whole range. There's a very famous theorem in statistics (the central limit theorem) which says roughly that if you add a whole bunch of variables together which are identically distributed, the resulting sum will have a Gaussian distribution. You could argue that something similar is causing this, but the the things being added together are not obviously identically distributed. It's not clear to me why the central limit theorem would apply to a dynamical process in time - the time profile of oil production is not a statistical sampling process, it's an economic/stochastic/sociological spread process through a complex geologic reality. More thought required here.

There must be references on this surely. But I haven't found them in my literature search to date, and can't quickly find them now. Anyone?

Anyway, to get a quick feel for prediction, I repeated the thing I did Thursday night of seeing what would happen if you were to use the model to predict production forward:

EIA Field production of crude in the US, logistic (Hubbert) fit based only on 1930-1976 data, and Gaussian fit based on same time interval. Source: EIA for the data.

Yikes. That's really good. Not sure if the Gaussian will always do so well, but this is certainly interesting...

There is a lot of discussion in this paper by Jean Laherrere.
Stuart, can you take that last graph you did and see what it will look like in 2020 and 2040?
And while you're at it, can you do also the same gaus-plot for the total world production and see how that prediction looks like? Thanx!
From Laherrere.

A simple Hubbert curve may be ideally applied only in the following cases:
3) Where a single geological domain having a natural distribution of fields is considered, political boundaries should be avoided.

OK, now in Laherrere's paper, he has examples from the FSU (former Soviet Union). And then, there is this later paper from Petroleum Review Is FSU oil growth sustainable? (pdf). He includes this linearization



But the FSU comprises several different oil provinces--West Siberia, Caspian Sea Basin, East Siberia, Arctic discussed by Colin Campbell in The Status of Oil and Gas Depletion in Russia (Dec 2004).
It is difficult to summarise the geology of this huge territory, but we may identify the main provinces:

  • The Western basins between the Barents and Caspian Seas with their Silurian source rocks
  • The West Siberian basins with the Jurassic source rocks
  • The Arctic domain
  • The locally productive Tertiary deltaic basin of Sakhalin on the Pacific margin
Here's a map I found just give people a visualization.


Click to enlarge

Below, westexas argues that Alaska should not be thrown in with the Lower 48--"Alaska might as well be in the Middle East". We wouldn't take Mexico, lump that together with Angola, and do a Hubbert style analysis, logistic or Gaussian, of both together.
Stuart I think this paper answers pretty much to your doubts. When modeling total US production you're including several discovery cycles.

When modeling just the lower-48 (like Laherrere does) Hubbert's curve fits better than the Gaussian. These curves are somewhat different from one another, especially for the late inflexion first inflexion in Hubbert's.

Although quite not sure (haven't got there yet), Central Limit Theorem applies also to the logistic case.

As for your doubts on why these models fit so well, I'd like to look again to the population issue. Remember the logistic spreading of the sasser virus? I guess you know that's the way living things grow over time. Now, you should know that since the early eighties that world oil production per capita is flat.

Al Bartlett looked at fitting gaussians to US and world oil production in the following paper:

An Analysis of US and World Oil Production Patterns Using Hubbert-Style Curves, Albert A. Bartlett, Mathematical Geology, V32, N1, Jan. 2000.

He used three variables: the estimated ultimate recovery (EUR), the date of the peak (tM), and the width of the gaussian (S). He then minimized the root mean square deviations between the data and the fit to find the EUR. He also looked at the sensitivity of his model to changes in tM and S and the uncertainty of the EUR, as well as per capita oil production, and R/P ratios. At the end of the paper, he compares his results with those of other researchers.

So, what's your take on the linearization prediction of 2.3 trillion barrels for world URR now that you've done this analysis?

I am really impressed with equations containing constants given to 9 or 10 significant figures.  :)
Is Bartlett's paper available on the web anywhere?
That I don't know. I have a hardcopy that he gave me at the ASPO-USA conference. I can fax or mail a copy of it to you if you can't find it elsewhere.
Here's a link to it.

http://www.hubbertpeak.com/bartlett/hubbert.htm
http://dieoff.org/page187.htm

and here's an interesting discussion of linearlization and Gaussians in a presentation:

http://www-physics.mps.ohio-state.edu/~aubrecht/AAPTSU02oil.pdf

So I'm not seeing too much insight in there into why it works. I can't believe this isn't well trodden ground.
There's a very famous theorem in statistics (the central limit theorem) which says roughly that if you add a whole bunch of variables together which are identically distributed, the resulting sum will have a Gaussian distribution.

Also, an important condition is that the variables must be independent (in short i.i.d.).

There are many variants of the Central Limit Theorem. One interesting formulation is the following (from the link you gave on wikipedia):

The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist). Thus the central limit theorem can be interpreted as a statement about the properties of density functions under convolution: the convolution of a number of density functions tends to the normal density as the number of density functions increases without bound, under the conditions stated above.

Since the characteristic function of a convolution is the product of the characteristic functions of the densities involved, the central limit theorem has yet another restatement: the product of the characteristic functions of a number of density functions tends to the characteristic function of the normal density as the number of density functions increases without bound, under the conditions stated above.


It's not easy to formulate the oil production problem in a strictly probabilistic framework. Curve fitting used here is a parameteric regression approach. An alternative approach is the nonparametric density estimation (or regression). It consists in estimating an unknown density function from a sum of kernel functions:

where h is the smoothing parameter and K(x) is the symmetric kernel function which must satisfy the following properties:

This formulation is attractive because K(x) can be interpreted as an elementary field production curve. Furthermore, you don't need to make assumpations about the shape of the curve (gaussian, logistic, etc.). For more info, here a quick introduction. I tried once a few simulations by adding elementary curves spawn by a prior model which was supposed to model the discovery pattern:

Sorry, the second link is not good, use this one:
A Statistical Model for the Simulation of Oil Production
The convolution point is a good one - I vaguely remember that from undergraduate functional analysis now you mention it. WebHubbleTelescope has been doing some interesting modeling where you take the discovery curve and convolute it to get the production curve, but as far as I can tell he more or less handcrafts the convolution function to make the past history fit. It's not clear here why there'd be enough layers of convolution to produce such good agreement with the Gaussian across several orders of magnitude. OTOH, it seems like there must be some central limit theorem type reasoning here. It would solve a problem in my mind - I would expect the logistic to be a rough approximation to oil production, but the degree of fit with the US production is surprising, and I can't think of any good reason why it should work so precisely. If there's really a central limit story for why the US production is Gaussian, then it's just down to the fact that the logistic derivative and Gaussian are pretty similar shapes.
In my opinion, you guys are making this way too complicated.  I think that the P/Q versus Q method works because we find the big fields first--it's as simple as that.   Therefore,  we are largely plotting the decline of the big fields.  The smaller fields that we find after the 50% of Qt mark have a largely trivial effect on Qt.  

The two best case histories are the Lower 48 and the North Sea.  Together, these two regions account for close to 20% of all oil produced to date worldwide.  The Lower 48 peaked at 48% of Qt, and the North Sea peaked at 52% of Qt--an average of 50%.  The world reached 50% in 2005, and the two facts we know are:  (1)  oil traded at record high nominal price levels in 2005 and (2)  oil production year over year is flat.  Both facts are consistent with a peak.  

I have a suggestion for an experiment.  You can easily plot the North Sea data, using the EIA data at the following website:  http://www.eia.doe.gov/emeu/ipsr/t41b.xls

Note that this is crude + condensate production.  In my opinion, using NGL's distorts the data because NGL's can easily come from gas reservoirs in addition to oil reservoirs (as can condensates, but that is a lesser factor).  

North Sea production starts in 1971.   We know that they peaked in 1999 at just a hair under 6 mbpd.  You plot annual production (P) divided by cumulative to date (Q) versus Q.  I think that I used a P/Q limit of about 20% (0.20) on my plot.   I suggest that everyone generate their own plot, do a best linear fit and come up with your own Qt.  I came up with 60 Gb.  Stuart could then compare the answers.  

ah, but it's all about finding the best fit for each case, wt.  the linearizations are good, but if you can find something that fits (and that ln curve for production is just creepy) better, then you have to give it creedence as a model.  

Parsimony of model is nice, but so is good fit...we're looking for models that are generalizable to all the units of analysis...more importantly if we can discern many models that fit relatively well but vary across the cases, then we can use model selection, and the assumptions behind each model, to start figuring out WHY the countries vary...

My point is that I am not aware of any real exceptions to the P/Q versus Q model.  In every case that I have seen where a region of sufficient size has decades of serious production, the data always: (1) show a linear progression and (2)  in the absence of a political event, e.g. Iran, peak occurs around 50% of Qt.  

I continue to think that including Alaska with the Lower 48 is a mistake.  In terms of both geology and timing of development (the Lower 48 peaked before serious production even began in Alaska), Alaska might as well be in the Middle East.  Alternatively, you could plot all of North America.  

Has any one done this?  May we perhaps impose on you to provide a graph of the North American situation, westexas?
There is a theory explaining how bell shaped curves such as the Gaussian and t-distribution can analyse statistical noise.  I'm not aware of any central limit type theorem for resource depletion. It could be just a triangle roughed around the edges. The number of trees on Easter Island fell to zero without skimming the horizontal axis. An alternative might be superimposed bar charts or skyscaper diagrams with the likely depletion curve sandwiched between the optimistic and  pessimistic scenarios.        
Re:  Top Petroleum Net Exporters, 2003 (those with net exports of one mbpd or more):

http://www.eia.doe.gov/emeu/security/topexp.html

There are 12 countries on this list.   The total net exports of those countries exporting less than one mpbd is not significant.  Note that the big three--Saudi Arabia; Russia and Norway--acount for more than 50% of the exports from the top 12 countries.  

Two of the countries--Saudi Arabia and Norway--are past their 50% of Qt marks--and both countries show declining production from 2003. Russia's production is flat year over year (I have never seen  P/Q versus Q plot for Russia).  

Total world oil production is interesting, but exports make the world go around.  With the top three either declining or showing flat production, where will the oil production come from to meet current, let alone future, export demand?   I think that this topic has been underexplored.

Note that Saudi Arabia's net 2003  exports  were alone basically equal to the sum of the bottom six on the top 12 list.

If we could get a Russian P/Q versus Q plot, we could take a stab at predicting the net exports from the big three over the next 5-10 years.  It ain't going to be pretty.  I suppose that it would make sense to lump the Soviet Union, Russian and FSU data in to one data file.

Re: "Total world oil production is interesting, but exports make the world go around.... I think that this topic has been underexplored."

I agree. Rick and I have had some discussions about writing something up on the importance of exports and the fungibility of oil. I had taken an initial stab at it when I wrote Algeria, Land of Opportunity? I see that Algeria is #11 on EIA's 2003 list. They are a mid-level producer that would not seem very important in the overall scheme of things but they are in terms of exports. You'll notice that Canada, for example, is not on the list.



dave..i have been thinking about fungibility of oil also, as have many other posters..i like to look at various events and put them together in a "big picture" view. i'd like feedback on other TOD's views of this.
i am fascinated by what has happened in the last week in this regard. i think there are two interacting cross currents going on now in world politics.
...first, the fading idea that the u.s. is a superpower, to be feared at all costs. the iraq war and it's consequences are eliminating that fear. like the vietnam war, the aftereffect will be a distaste in american minds for foreign involvement.
so what, you say....the second crosscurrent..if countries that control oil and finance consider themselves outside of the u.s. influence, they will begin to act in their own self interest, including how they foresee parsing out their remaining,dwindling oil supplies . four cases in point from the last week:
one...russia's treatment of the ukraine, and it's potential shot across the bow of europe (and possibly the u.s.)
The move led to a sharp drop in gas supplies among some of the 28 countries in Europe and the former Soviet Union that rely on Russian gas. Bosnia and Herzegovina, Croatia and Serbia were among the Gazprom customers that reported a reduction in natural gas supplies of between 30 per cent and 50 per cent since Sunday.

two...the reversal of OPEC in their decision to reduce supply:

"The price gets above $60 and OPEC is a dove," said Subash Chandra, an analyst at Morgan Keegan & Co. in New York. "It gets below $60 and it gets hawkish, so the market's got a very good sense of where OPEC wants prices."

..no more we'd like to see oil at $40-50 a barrel

three...rumblings in brazil that oil supply should be limited as posted by alan yesterday
Exporting oil is "an act of treason," reckons Heitor Manoel Pereira, president of the Association of Petrobras Engineers, or AEPET, with 3,923 members in the active work force. "Brazil is no Saudi Arabia that can export as it wishes. This will reduce our possibility for development in coming years," Pereira said.

and finally... the economic blockbuster, first posted by geopoet, about china moving it's investments out of the dollar
"It is a subtle but clear signal that they are interested in moving away from the US dollar into other currencies, and are interested in setting up some kind of strategic commodity fund, maybe just for oil, but maybe for other commodities," he said.

...individually, they are, IMHO, surprising events, but taken together , in such a short time scale...i think they represent a substantial shift in worldview.
comments?

Surprising events? Surely not.

Ukraine wants to be 'western' and turn its back on Russia, then let them pay western market prices.

OPEC would mostly rather make as much money by pumping less rather than (in Saudi's case) be exposed as not having swing production capacity.

Brasil is a growing and developing nation, it does not wish to sell itself into greater future poverty for short term profiteering. Treason seems a most appropriate term.

China has more $ denominated assets than it is comfortable with (given it will pull the rug from under the $ someday), diversification mandatory - and has been happening softly for near a year.

They are but the initial ripples of 'end of empire'. The US one has been brief, a mere 100 years, but those who watch such things know it is coming within 20 years. Perhaps you are seeing these events as 'at odds' with the current system - as determined by the USA - and should rather see them as rational symptoms of the ending of that system and US hegemony.

Past such events have always been bloody and the odds strongly favour that this time too. We will have to grow up very fast to avoid it, the signs bode ill.


I might have been one of those suggesting that a guassian fit would be better (based in part on what Deffeyes has said), but this all gets me to thinking.

What are we really learning by doing this?  There will always be some noise in the data for various reasons that this type of curve fitting will never tell us.  There could be an economic downturn that could suppress demand.  The hurricanes last year would have caused a bit of a blip.  A terrorist attack (for that matter the insurgents in Iraq are effectively cutting off Iraq's oil supply).

So at the end of the day, do we get a more accurate guess as to when the peak will be, or the depletion rates we will see in the future?

I was thinking about this too ericy...and I think it's getting a handle on the fundamentals of the story in a manner that gives the community more evidence for its arsenal.  It's kind of like watching sausage get made as we watch all of these really smart folks hash it out, but at the end of the day, if we can get a better handle on the curve, we can move from that curve to start making more accurate predictions...

Still, your point about exogenous events is a good one.  Exogenous events are (cough trite alert) tough to predict, but at least we have a good idea of what exogenous events are possible and their probability of actually occurring, save the actual reserve numbers.  

Makes me start thinking Bayes...you know?

Re: "at least we have a good idea of what exogenous events are possible and their probability of actually occurring..."

This is news to me. If we did...

Re: "Makes me start thinking Bayes...you know?"

then I'd be thinking Bayes too. Wanna expand on this a bit, PG?
One can assign a more probable/less probable value to events based on assessments however often you wish.  I am not saying you'd be 'right' but that's not the goal...for instance, we know that the likelihood of a terrorist attack on US soil has gone up over the last twenty years, don't we?  we know that the likelihood of a run against Iran has increased over the last six months, right?  all of those variables, put together, could give us a probability of a "shock" to the system, which we could then vary based on that information...does that make sense?

in better words, with Bayes, you assign prior probabilities to events you wish to control for...they are still guesses, but they can at least be informed guesses...

I am still learning it, to be honest.  I've played with it a bit in some of my professional work, but I am by no means an expert.  

check this out: (g-d I love wiki)

http://en.wikipedia.org/wiki/Bayes%27_theorem

Ah, that important reality would be so amenable.

Unfortunately it is the relatively low probability events, very hard to predict rationally in advance, that mostly shape the critical turning points of our 'machine'. How would the world have turned had an archduke not been shot in Sarajevo and a different event triggered WWI a year or so later or earlier? The map of Europe would likely have been somewhat different today, as might subsequent history.

The best we can probably do is know the critical times, when things are in delicate balance or imbalance and relatively small events might overturn nearly everything.  Sometimes the balance can be restored, sometimes not and events take their own course, beyond reason, modelling and human control. Yes, we can guess, speculate, model, predict rationally or less rationally at such times but, if we are honest, we would admit that we are really doing so as a 'comfort', knowing that we are occupying time while reality crystallizes into its new form.

I know, you probably know, we have entered such a critical time. Something relatively unexpected could happen today, tomorrow, in a year or so's time, and reality as it has been for 40 or more years may be gone, the rules changed, the challenges new. We also know this is very likely to be a big one. And we know the chances are getting significant and growing.

Sharon's stroke has changed future probabilities somewhat, at first look it seems to the short term 'safer' side. But there are other potential events in the next few months which may trump that. Sometimes I wish I knew but other times my rational side dominates and I am just afraid (not irrationally so). Fortunately or not my 'irrational' side is usually more likely to be right, and so I continue to listen to it.

So, back to the subject. There is nothing wrong with using Bayesian techniques to evaluate probabilities of potential events, I would say do it with all fervour. But do remember that amongst the many events with less than 10% probability lurk a big handful which would change everything at the 'right' moment. Perhaps some way of modelling the sum of them by year might result in the most useful analysis.

you have me there, agric...but can't we still move towards our goal of refining our models to gain explanatory power as we can over time (through data mining AND causal analysis)?  in other words, incorporating increases in probabilities in certain explanatory factors, while discounting others, based on our best guesses to explain y?

if I become 95% sure that a 5% probability event of an 8 magnitude (play numbers, but bear with me) will occur in the next five years, and that has changed from a 75% certainty of 10% of a 9...and I have some theoretical expectation of the things that certain event will affect, have I not gained explanatory power on my dependent variable, even if it is a latent variable such as the probability of oil being at $100?

Of course, PG, and we should: everything we can do to perceive, analyse and understand should be done while we are waiting for reality to unfold. That way we are most likely to quickly grasp its unfolding.

I 'waste' some of my time trying to 'see' the possible futures, trying to work out when critical points might be, guessing at what they are, their causes and consequences, sometimes attempting to influence now to affect then. Objectively I would call that mad, but evidence seems to suggest some validity; truly mad, LOL.

I think it may be difficult to use Bayesian methods to model (in advance) what is really critical. My understanding is that it is hard to apply to a large group of low probability events in an effective way, but I would be very interested if you can show otherwise - then we might try to produce a list of potential events and assign probabilities.

How can we use Bayesian methods to model n1 to n99+ events with probabilities p1 to p99+ (where pn is < 0.1) such that we can say p[all] is >= 0.95?

For now I would say that, as a rough approximation, the odds of a massively disruptive event in 2006 is about 30% which will increase by about 50% of the prior year, year on year. If by 1st Jan 2010 things are mostly as they are today I will be completely astonished (I have never been completely astonished in my 51 years of life).

I would make the case that we're in a global chaotic system approaching a bifurcation point.  Lots of cultural, political, and economic energy being dissipated and not necessarily towards a productive end.  A small event could completely change the state of the system.  The uncertainty of forecasts may in fact be an attribute of the chaotic nature of the system, e.g. like the weather: no way to predict when the change will occur, what will drive the change, or to which new state the system may evolve, other than in very limited ways.
Re: "There could be an economic downturn that could suppress demand."

Right. Or production. Westexas brought up a linearization for Russia. So, I looked around a bit. EIA production data starts in 1991--that's not too surprising. The BP data starts in 1985 and thinking of PG's exogenous events (there's a euphemism if I ever heard one!) -- here's Russian production.


Click to enlarge
I too think there's more than economics at play in the decline after 1988.  In 1988/89 the winds of political change were blowing pretty hard in the Soviet Union. In response, the economy was changing from a centralised model towards a market model. It took about a decade for that change to work its way through the production systems. This curve shows 'demand destruction' for a decade, one premise being that the cause was initial political instability.  
The problem with these models, whether Gaussian or logistic (yeast curve), is that there seem to be no theoretical grounds for why they work. At some level, sure, it makes sense that production for a field would start at zero, climb to some peak, and then fall off to zero. And both these curves have this property, as many other curves do. The remarkable closeness of fit of the Gaussian, and the lesser closeness of the logistic, must be more than coincidence. But it is hard to see why they work.

The biggest mystery to me is this: why the symmetry? Why the heck is the down side a mirror of the up side? I don't see a reason in the world why that would be true, in everything I can think of suggests that it should not be.

During the growth phase, production is limited at first by the costs of new investment and by alternative opportunities for investment capital. As the field develops, production growth begins to slow down. The field is approaching "maturity" and the owners are not investing that much more into it. Maybe it is saturated in terms of reasonable places to put in new wells, or at least the cost of adding more equipment won't be paid back in the lifetime of the field.

Eventually production peaks, which seems to be largely a physical limitation. You just can't suck oil out faster at a reasonable cost. (It's worth noting that this may not be the  reason, it may be that you could suck oil out faster, but the cost of adding more equipment to do this would not be paid back in the relatively short remaining lifetime of the field - in that case, the owners in effect decide to let the field peak in order to maximize their profits.)

And then we're on the decline, which now seems to be purely physical. We're not adding or removing wells, but the oil is getting harder and harder to suck out. Every year we get less.

So here is the mystery again: the decline seems to be primarily a physical process based on the reluctance of oil to be pulled out of the rock. But the growth phase seems to be largely economics-based. The rate of production growth is limited by economic decisions about how much to invest in the field at each point in its lifetime. I don't see why these two phases should mirror each other.

In terms of Stuart's graph above, this translates into why the slope of the fitted line is constant as it crosses the horizontal axis. Why does the decrease in production growth rate (a confusing concept, the third derivative of oil remaining!) remain the same post-peak as pre-peak?

One problem is that only a small portion of the line is below the axis. The U.S. is only slightly past its peak when we look at the whole history. It would be interesting to apply the analysis to a single field, one that peaked long ago, to see how well the right side of the production curve mirrored the left side.

well, the symmetry isn't guaranteed, but it is the most likely distribution of a series of stochastic measurements (the same logic as a sampling distribution and why it works, for example)...if there are biases against that stochasticity (my "exogenous events" that seem to be the theme of the day for example...) then the distribution will change accordingly.

Either way, the area under the curve is finite...but that's also why the reserve numbers are such a big deal.  In the US we have a good idea of how much petroleum we have left, so this all works quite well...we're just trying to fit this, so that we can generalize to other countries where we have less complete information.

Models don't always have to reflect the underlying mechanics. There exist many models in science and industry which essentially look at empirical data and notice certain patters. For instance, Google the term 'experience curve' which relates the cost of a manufactured product to the cumulative mass production of the product. Such empirical models can be useful to a certain extent. The nagging question of emperical models is that one is never sure if the situation of interest will become the exception to the model or not.

Models derived from the mechanics/physics of the phenomena of interest might be percieved as more legitimate, yet they are only as good as the number of factors they take into account. In physical models the nagging question is whether some mechanism was overlooked and not included in the model.

Another interesting problem is when a model can correctly predict outcomes, yet is still wrong. Take Maxwell's equations on light propagation. Here the equations were correct, yet the Maxwell's concept of light propagating through an either was wrong.

With regards to oil, if it can be shown that most oil fields follow a similar pattern (barring govenment collapse or war) then that should be convincing on its own. One need only assemble data on many oil fields.  

OK, so maybe the individual fields aren't Gaussian, in fact maybe not even logistic, but their production does go up and down in a roughly hump-shaped way. And maybe individual fields aren't symmetric, the down side doesn't necessarily mirror the upside.

But then, when we look at the whole U.S. we are adding up all the contribution from all the fields, and as a general principle, adding a bunch of independent, hump-shaped distributions will tend to produce a Gaussian? And this summation will be symmetric? Does that work?

My gut is it's got to be something like that. However, I don't off the top of my head see how to cast that basic idea into a more rigorous model yet.

adding a bunch of independent, hump-shaped distributions will tend to produce a Gaussian? And this summation will be symmetric? Does that work?

There is no guarantee that this kind of summation will produce a gaussian or a symmetric curve. I believe that it has to do with the time of production start that is a random variable with an unknown distribution related to the discovery curve. We can however make the following observations:
  • the distribution of field size follow roughly a lognormal distribution which implies that you have very few big fields and a lot of small fields.
  • big fields were discovered first and exploited first.
  • there is a lot of overlap between the lifetime of indiviual fields (discoveries are concentrated in time).
These conditions must play a role in the shape of the observed total production curve.
Big fields have a larger cross-section for discovery. However few exploratory resources were applied to find them initially and their rarity caused (as in luck of the draw) more of them to be discovered in the 20th century.

AFAIK, no one has been able to confirm correlations in discoveries from year to year.

I have the sense that the largest fields in the US are not as large relative to the whole country as in some other domains. I'm rushing off and can't check now, but I don't think East Texas or Prudhoe Bay are nearly as central to US production as Ghawar, or even Samotlor to Russia. I'm wondering speculatively if this is somehow a factor in why the curve fitting is so pretty in the US relative to other domains.
That's an interesting point. Maybe the fact that production comes from a lot of small/medium fields is a factor in making the total production curve more gaussian. A large dominating field will probably impose more structure and less "stochastic" effect on the total production. That reminds me a lot of the modeling of RADAR backscattering phenomenon (my field of work) where a lot small scatterers within the resolution cell will produce a random gaussian distributed response whereas a few strong scatterers will produce an almost deterministic, non gaussian response. This deserves more investigation but data on field sizes is required.
Ah yes, essentially the difference between diffraction effects and specular reflection effects. I would rather think the macro effects of smoothing play a large part of this response.
http://mobjectivist.blogspot.com/2006/01/would-you-believe.html
I see your problem, Halfin, and it is a fair one.

A partial explanation, considering a single well: as pressure declines so will oil exuded, that has a mathematical shape based on pressure, viscosity etc. This can be enhanced by various methods (pumping water into reservoir, reducing viscosity using steam, etc) but these cost energy and money so the well ultimately becomes unviable due to energy and / or money ROI. These things would apply to virtually all individual wells so is relatively straightforward to model.

When summed over a field these basics should result in a relatively fairly smooth and modelable shape. Once one goes beyond a single field it could be one is adding apples and pears. However, mathematical modelling is essentially an exercise in pragmatism: what predicts best works.

I would say the more 'interesting' area to question is the upslope. Why do fields seem to ramp up in a smooth way? I'd bet the answer is: they really don't, LOL, at least not consistently (just consider the fundamentals). I'd be inclined to bet that the downslope is more predictable and smooth than the upslope. So you are right to question the symmetry but I would bet the symmetrical approximation has more validity on the downslope than the upslope. I think we are probably past giving a f*ck about the upslope, perhaps that signifies something?

In groundwater flow to a single well the rate of production tends to fall off with the logarithm of time under a fairly wide range of conditions.  I would presume flow to a single oil well would do the same unless affected by EOR or by other wells.  (Are the some petroleum folks out there who can confirm or refute this?)

Interestingly enough, M. King Hubbert did the mathematical derivation of a number of equations for flow in porous media that are used in both groundwater and the oil patch.

Phx, AZ

I think wells are typically added when production begins to falter. It may be that wells are continuously added pretty much from beginning to end, including some time after the first wells are sealed.
Well drilling is not linear. A field starts with one rig, some time passes, then a second is added, at which point twice the number of wells are drilled/unit time. If the field warrants, a third etc is added. All rigs continue until all allocated hole locations have been drilled, including any involved with field expansion. In a large field, the first wells might be plugged, or use to dispose of salt water waste, etc.

At this point production has decayed but continues. Now, the (US) field might be sold to a small E&P because production is becoming more labor intensive. Then, some wells are maybe used for water injection (secondary production), other wells are re-worked.

Why the symmetry

I'm not much of a catastrophist, but I'm kind of afraid that the answer might be 'because humans with oil act a whole lot like yeast with sugar'.  

Now if we can only manage the part where we end up with beer.  ;)

Something very peculiar about this plot.  In 1859, Drake struck oil in Titusville, Pennsylvania.  The EIA data shows this as 2 barrels and then in 1860 they show 500 barrels. However, you don't plot these two points in your fit. As a matter of fact, if you take that gaussian and extrapolate backwards you get 100 barrels produced in 1840! That was 20 years before oil was discovered in the USA.

Actually, the pseudo-gaussian behavior is easy to account for if you assume accelerated oil discoveries over time, i.e. d2D/dt2 = k. Remember, that the bulk of oil discoveries were made in the mid 1900's. And then you need to apply the oil shock model, which effectively modulates the extraction rate due to stochastic delays in the system.

This will also account for the values in 1859 and 1860, as the oil shock model formulation obeys causality.

The rate equation of the gaussian that you mentioned gives a degenerate solution of P(t) = 0.  You have to give it a forcing function in 1859 to initiate causality. However once you do this, you won't have a gaussian solution, or at least it won't be something you can analytically produce. Same holds true of the logistic curve.

I took off the first few years because they were outliers. I don't personally feel it's much of a problem that the full history doesn't correctly account for how many barrels Col Drake got in 1859. We would expect the extreme tails to be very noisy for any model.
A slightly oblique point... Yes, the tails will be spikey and should be 'massaged' / smoothed or ignored when building a viable model.

However, I think the period immediately preceding peak will be abnormal, too. Essentially we are making a transition from a mostly economic model to a mostly geologic resource limited model. I think this is visible in some of your previous analysis here, Stuart. Production that is mothballed and / or can be rapidly ramped up mostly has been, now the geologic and logistic determinants of increased production have become the determinants.

The peak and downslope are likely to be 'polluted' too. Countries will 'manage' production so it is not brought onstream and exported so as to protect their own future. A comment here today on Brasil and possible signs from Russia I would quote as the first symptoms of this.

Peak will probably be confusingly noisy for these reasons.

It wasn't meant to be easy, but us monkeys were meant to be smart, LOL.

They did get oil in 1840, skimmed from the pond with dippers.

There's a very famous theorem in statistics (the central limit theorem) which says roughly that if you add a whole bunch of variables together which are identically distributed, the resulting sum will have a Gaussian distribution. You could argue that something similar is causing this, but the the things being added together are not obviously identically distributed.

Actually independence of the random variables is more essential than being identically distributed.  Take for example the
Lindeberg-Feller theorem


It's not clear to me why the central limit theorem would apply to a dynamical process in time - the time profile of oil production is not a statistical sampling process, it's an economic/stochastic/sociological spread process through a complex geologic reality. More thought required here.

Maybe the per-well cross-section is non-normal, but when you aggregate them, you get something more normal.  (Ditto for the "sociological cross-section", etc. CLT is precisely about aggregation across independent events.)  The time profile of oil production isn't exactly a sampling process, but the timing of oil strikes is more plausibly statistical in nature.  It's a sort of sampling of the Earth's random distribution of oil deposits.  And once oil is found, the way it is extracted is probably broadly similar with a few parameters to do with the size of the find etc. which themselves are randomly distributed.  

But it's the independence part that's not obvious (at least to me). Geologists find oil by discovering a new kind of "play" - a particular type of rock structure and trap that turns out to contain oil, and then looking for more instances of that play. Once the industry discovers an interesting new play, they all converge on it and compete like crazy till it's sucked dry. So a model that says production at different oil fields is independent is not obviously plausible.

I wonder if random walk type arguments work better here. If we consider the oil discovery and development process to be like a random walk (through a very complex abstract topology of oil plays and fields in which distance is some amalgam of physical distance and "conceptual" distance as a play) can we get somewhere. The model is basically to view the oil industry as a diffusion process through the oil bearing landscape. Random walks in Euclidean space have a gaussian density in distance from the origin, which spreads out over time - there's an exp(-x^2/t) type behavior. Could that give rise to an e(-t^2) behavior as it crosses the main oil bearing parts of the topology?

I don't know - seems like the industry is not random - we'd have to model the economics of fields as representing different probablities of going one way versus another in the oil bearing topology.

If you look at the distribution of field sizes worldwide:

volume and production in mb and mbpd
Field Size         No of   Tot.                               | Extrapolation
(production vol.)  fields  Prod   <1950  50s  60s 70s 80s 90s 2000s 2010s
1,000+             4        8,000   2    1    0   1   0   0     0     0
500-1,000         10        5,900   2    3    3   1   1   0     0     0
300-500           12        4,100   3    1    6   1   1   0     0     0
200-300           31        6,450   8    4    6   9   1   1     1     1
100-200           83        7,900   5    8    13  13  11  11    11    11
0-100           4,400      36,200  900  800  700 600 500 400   300   200
Total           4,540      38,550  920  817  728 625 514 412   312   212

src:  "Twilight in the desert" in the appendix B (p. 374 and 375)

Some 4,400 small fields produce more than 50% of the toal production. I think that reasonably we can assume than these small fields are relatively decorrelated in term of production profile and maturity (even if locally in time and space they are some correlations). The number of wells drilled must also be a factor, the US has 533,000 oil wells, averaging less than 17 barrels/well/day compared to 750 wells for Saudi Arabia -- averaging more than 12,000 barrels/well/day. There is a good chance that Saudi Araiba will never have a gaussian like production profile.

Khebab, I like your posts but formatting!
There are a variety of abstractions of the CLT into spatial models broadly similar to what you are describing.  They tend to take the form of "shape theorems" which say how a percolation process (for instance) scales over large distances.  Up close (at short distances and short times) an  "activated" subgraph has a ragged shape, but from a distance (large distances, long times) the shape smooths out and the size is predictable.  

It isn't obvious how to apply such ideas to oil production over time, but that might be the most relevant mathematics to try to apply.  The production curve might be expressible as a simple function of the set of vertices which are active (explored deposits) thereby transforming a statement about limiting shape of the subgraph into a statement about the production curve.  Note that the limit here is in a rescaling parameter on the model, not time per se.

Even if the math is applicable, I suspect the result you'd find is roughly that for "large planets, with large oil deposits and large oil industries", the shape gets close to gaussian.  There would also have to be a ton of assumptions about how to model the geology and economics.  Still, I don't know that the heuristic argument for using a logistic curve is any better.

There are a variety of abstractions of the CLT into spatial models broadly similar to what you are describing.  They tend to take the form of "shape theorems" which say how a percolation process (for instance) scales over large distances.  Up close (at short distances and short times) an  "activated" subgraph has a ragged shape, but from a distance (large distances, long times) the shape smooths out and the size is predictable.  

It isn't obvious how to apply such ideas to oil production over time, but that might be the most relevant mathematics to try to apply.  The production curve might be expressible as a simple function of the set of vertices which are active (explored deposits) thereby transforming a statement about limiting shape of the subgraph into a statement about the production curve.  Note that the limit here is in a rescaling parameter on the model, not time per se.

Even if the math is applicable, I suspect the result you'd find is roughly that for "large planets, with large oil deposits and large oil industries", the shape gets close to gaussian.  There would also have to be a ton of assumptions about how to model the geology and economics.  Still, I don't know that the heuristic argument for using a logistic curve is any better.

I hope someone writes a primer about logistic and gaussian curves and why they are so useful... something like dr Bartlet 's lecture on exponential growth would do!