Predicting US Production with Gaussians

EIA Field production of crude in the US, logistic (Hubbert) fit based only on 1930-1976 data, and Gaussian fit based on same time interval. Source: EIA for the data.

So, I was fooling around tonight, and made a long term graph of US production growth rates (year-on-year), which looks as follows. Because the data are so noisy, I fit a polynomial to it just to get a sense of the trend. The polynomial came out almost a straight line. I varied the degree (in the graph it's a polynomial of degree 6), but it always wanted to be more or less straight.

Year-on-year change in EIA Field production of crude in the US, with linear and sixth order polynomial fit. Source: EIA for the data.

That's not what the logistic would say - the logistic would call for an S shaped decline in the growth rate, starting at K (which is around 6%). Of course, we know the logistic is not that great at modeling the early production. Still, that straight line is really sticking out. Hmmm. Scratch head, write a few equations, turns out that the function that has a linearly decreasing growth rate is a Gaussian. I've vaguely heard of people using Gaussian's instead of logistics as models of the peak, but haven't played with it myself before tonight.

So, plot the log of production versus time and fit a parabola: Oh my.

Natural log of EIA Field production of crude in the US, with quadratic fit. Source: EIA for the data.

Pretty good fit across the whole range. There's a very famous theorem in statistics (the central limit theorem) which says roughly that if you add a whole bunch of variables together which are identically distributed, the resulting sum will have a Gaussian distribution. You could argue that something similar is causing this, but the the things being added together are not obviously identically distributed. It's not clear to me why the central limit theorem would apply to a dynamical process in time - the time profile of oil production is not a statistical sampling process, it's an economic/stochastic/sociological spread process through a complex geologic reality. More thought required here.

There must be references on this surely. But I haven't found them in my literature search to date, and can't quickly find them now. Anyone?

Anyway, to get a quick feel for prediction, I repeated the thing I did Thursday night of seeing what would happen if you were to use the model to predict production forward:

EIA Field production of crude in the US, logistic (Hubbert) fit based only on 1930-1976 data, and Gaussian fit based on same time interval. Source: EIA for the data.

Yikes. That's really good. Not sure if the Gaussian will always do so well, but this is certainly interesting...

There is a lot of discussion in this paper by Jean Laherrere.
Stuart, can you take that last graph you did and see what it will look like in 2020 and 2040?
And while you're at it, can you do also the same gaus-plot for the total world production and see how that prediction looks like? Thanx!
From Laherrere.

A simple Hubbert curve may be ideally applied only in the following cases:
3) Where a single geological domain having a natural distribution of fields is considered, political boundaries should be avoided.

OK, now in Laherrere's paper, he has examples from the FSU (former Soviet Union). And then, there is this later paper from Petroleum Review Is FSU oil growth sustainable? (pdf). He includes this linearization



But the FSU comprises several different oil provinces--West Siberia, Caspian Sea Basin, East Siberia, Arctic discussed by Colin Campbell in The Status of Oil and Gas Depletion in Russia (Dec 2004).
It is difficult to summarise the geology of this huge territory, but we may identify the main provinces:

  • The Western basins between the Barents and Caspian Seas with their Silurian source rocks
  • The West Siberian basins with the Jurassic source rocks
  • The Arctic domain
  • The locally productive Tertiary deltaic basin of Sakhalin on the Pacific margin
Here's a map I found just give people a visualization.


Click to enlarge

Below, westexas argues that Alaska should not be thrown in with the Lower 48--"Alaska might as well be in the Middle East". We wouldn't take Mexico, lump that together with Angola, and do a Hubbert style analysis, logistic or Gaussian, of both together.
Stuart I think this paper answers pretty much to your doubts. When modeling total US production you're including several discovery cycles.

When modeling just the lower-48 (like Laherrere does) Hubbert's curve fits better than the Gaussian. These curves are somewhat different from one another, especially for the late inflexion first inflexion in Hubbert's.

Although quite not sure (haven't got there yet), Central Limit Theorem applies also to the logistic case.

As for your doubts on why these models fit so well, I'd like to look again to the population issue. Remember the logistic spreading of the sasser virus? I guess you know that's the way living things grow over time. Now, you should know that since the early eighties that world oil production per capita is flat.

Al Bartlett looked at fitting gaussians to US and world oil production in the following paper:

An Analysis of US and World Oil Production Patterns Using Hubbert-Style Curves, Albert A. Bartlett, Mathematical Geology, V32, N1, Jan. 2000.

He used three variables: the estimated ultimate recovery (EUR), the date of the peak (tM), and the width of the gaussian (S). He then minimized the root mean square deviations between the data and the fit to find the EUR. He also looked at the sensitivity of his model to changes in tM and S and the uncertainty of the EUR, as well as per capita oil production, and R/P ratios. At the end of the paper, he compares his results with those of other researchers.

So, what's your take on the linearization prediction of 2.3 trillion barrels for world URR now that you've done this analysis?

I am really impressed with equations containing constants given to 9 or 10 significant figures.  :)
Is Bartlett's paper available on the web anywhere?
That I don't know. I have a hardcopy that he gave me at the ASPO-USA conference. I can fax or mail a copy of it to you if you can't find it elsewhere.
Here's a link to it.

http://www.hubbertpeak.com/bartlett/hubbert.htm
http://dieoff.org/page187.htm

and here's an interesting discussion of linearlization and Gaussians in a presentation:

http://www-physics.mps.ohio-state.edu/~aubrecht/AAPTSU02oil.pdf

So I'm not seeing too much insight in there into why it works. I can't believe this isn't well trodden ground.
There's a very famous theorem in statistics (the central limit theorem) which says roughly that if you add a whole bunch of variables together which are identically distributed, the resulting sum will have a Gaussian distribution.

Also, an important condition is that the variables must be independent (in short i.i.d.).

There are many variants of the Central Limit Theorem. One interesting formulation is the following (from the link you gave on wikipedia):

The density of the sum of two or more independent variables is the convolution of their densities (if these densities exist). Thus the central limit theorem can be interpreted as a statement about the properties of density functions under convolution: the convolution of a number of density functions tends to the normal density as the number of density functions increases without bound, under the conditions stated above.

Since the characteristic function of a convolution is the product of the characteristic functions of the densities involved, the central limit theorem has yet another restatement: the product of the characteristic functions of a number of density functions tends to the characteristic function of the normal density as the number of density functions increases without bound, under the conditions stated above.


It's not easy to formulate the oil production problem in a strictly probabilistic framework. Curve fitting used here is a parameteric regression approach. An alternative approach is the nonparametric density estimation (or regression). It consists in estimating an unknown density function from a sum of kernel functions:

where h is the smoothing parameter and K(x) is the symmetric kernel function which must satisfy the following properties:

This formulation is attractive because K(x) can be interpreted as an elementary field production curve. Furthermore, you don't need to make assumpations about the shape of the curve (gaussian, logistic, etc.). For more info, here a quick introduction. I tried once a few simulations by adding elementary curves spawn by a prior model which was supposed to model the discovery pattern:

Sorry, the second link is not good, use this one:
A Statistical Model for the Simulation of Oil Production
The convolution point is a good one - I vaguely remember that from undergraduate functional analysis now you mention it. WebHubbleTelescope has been doing some interesting modeling where you take the discovery curve and convolute it to get the production curve, but as far as I can tell he more or less handcrafts the convolution function to make the past history fit. It's not clear here why there'd be enough layers of convolution to produce such good agreement with the Gaussian across several orders of magnitude. OTOH, it seems like there must be some central limit theorem type reasoning here. It would solve a problem in my mind - I would expect the logistic to be a rough approximation to oil production, but the degree of fit with the US production is surprising, and I can't think of any good reason why it should work so precisely. If there's really a central limit story for why the US production is Gaussian, then it's just down to the fact that the logistic derivative and Gaussian are pretty similar shapes.
In my opinion, you guys are making this way too complicated.  I think that the P/Q versus Q method works because we find the big fields first--it's as simple as that.   Therefore,  we are largely plotting the decline of the big fields.  The smaller fields that we find after the 50% of Qt mark have a largely trivial effect on Qt.  

The two best case histories are the Lower 48 and the North Sea.  Together, these two regions account for close to 20% of all oil produced to date worldwide.  The Lower 48 peaked at 48% of Qt, and the North Sea peaked at 52% of Qt--an average of 50%.  The world reached 50% in 2005, and the two facts we know are:  (1)  oil traded at record high nominal price levels in 2005 and (2)  oil production year over year is flat.  Both facts are consistent with a peak.  

I have a suggestion for an experiment.  You can easily plot the North Sea data, using the EIA data at the following website:  http://www.eia.doe.gov/emeu/ipsr/t41b.xls

Note that this is crude + condensate production.  In my opinion, using NGL's distorts the data because NGL's can easily come from gas reservoirs in addition to oil reservoirs (as can condensates, but that is a lesser factor).  

North Sea production starts in 1971.   We know that they peaked in 1999 at just a hair under 6 mbpd.  You plot annual production (P) divided by cumulative to date (Q) versus Q.  I think that I used a P/Q limit of about 20% (0.20) on my plot.   I suggest that everyone generate their own plot, do a best linear fit and come up with your own Qt.  I came up with 60 Gb.  Stuart could then compare the answers.  

ah, but it's all about finding the best fit for each case, wt.  the linearizations are good, but if you can find something that fits (and that ln curve for production is just creepy) better, then you have to give it creedence as a model.  

Parsimony of model is nice, but so is good fit...we're looking for models that are generalizable to all the units of analysis...more importantly if we can discern many models that fit relatively well but vary across the cases, then we can use model selection, and the assumptions behind each model, to start figuring out WHY the countries vary...

My point is that I am not aware of any real exceptions to the P/Q versus Q model.  In every case that I have seen where a region of sufficient size has decades of serious production, the data always: (1) show a linear progression and (2)  in the absence of a political event, e.g. Iran, peak occurs around 50% of Qt.  

I continue to think that including Alaska with the Lower 48 is a mistake.  In terms of both geology and timing of development (the Lower 48 peaked before serious production even began in Alaska), Alaska might as well be in the Middle East.  Alternatively, you could plot all of North America.  

Has any one done this?  May we perhaps impose on you to provide a graph of the North American situation, westexas?
There is a theory explaining how bell shaped curves such as the Gaussian and t-distribution can analyse statistical noise.  I'm not aware of any central limit type theorem for resource depletion. It could be just a triangle roughed around the edges. The number of trees on Easter Island fell to zero without skimming the horizontal axis. An alternative might be superimposed bar charts or skyscaper diagrams with the likely depletion curve sandwiched between the optimistic and  pessimistic scenarios.        
Re:  Top Petroleum Net Exporters, 2003 (those with net exports of one mbpd or more):

http://www.eia.doe.gov/emeu/security/topexp.html

There are 12 countries on this list.   The total net exports of those countries exporting less than one mpbd is not significant.  Note that the big three--Saudi Arabia; Russia and Norway--acount for more than 50% of the exports from the top 12 countries.  

Two of the countries--Saudi Arabia and Norway--are past their 50% of Qt marks--and both countries show declining production from 2003. Russia's production is flat year over year (I have never seen  P/Q versus Q plot for Russia).  

Total world oil production is interesting, but exports make the world go around.  With the top three either declining or showing flat production, where will the oil production come from to meet current, let alone future, export demand?   I think that this topic has been underexplored.

Note that Saudi Arabia's net 2003  exports  were alone basically equal to the sum of the bottom six on the top 12 list.

If we could get a Russian P/Q versus Q plot, we could take a stab at predicting the net exports from the big three over the next 5-10 years.  It ain't going to be pretty.  I suppose that it would make sense to lump the Soviet Union, Russian and FSU data in to one data file.

Re: "Total world oil production is interesting, but exports make the world go around.... I think that this topic has been underexplored."

I agree. Rick and I have had some discussions about writing something up on the importance of exports and the fungibility of oil. I had taken an initial stab at it when I wrote Algeria, Land of Opportunity? I see that Algeria is #11 on EIA's 2003 list. They are a mid-level producer that would not seem very important in the overall scheme of things but they are in terms of exports. You'll notice that Canada, for example, is not on the list.



dave..i have been thinking about fungibility of oil also, as have many other posters..i like to look at various events and put them together in a "big picture" view. i'd like feedback on other TOD's views of this.
i am fascinated by what has happened in the last week in this regard. i think there are two interacting cross currents going on now in world politics.
...first, the fading idea that the u.s. is a superpower, to be feared at all costs. the iraq war and it's consequences are eliminating that fear. like the vietnam war, the aftereffect will be a distaste in american minds for foreign involvement.
so what, you say....the second crosscurrent..if countries that control oil and finance consider themselves outside of the u.s. influence, they will begin to act in their own self interest, including how they foresee parsing out their remaining,dwindling oil supplies . four cases in point from the last week:
one...russia's treatment of the ukraine, and it's potential shot across the bow of europe (and possibly the u.s.)
The move led to a sharp drop in gas supplies among some of the 28 countries in Europe and the former Soviet Union that rely on Russian gas. Bosnia and Herzegovina, Croatia and Serbia were among the Gazprom customers that reported a reduction in natural gas supplies of between 30 per cent and 50 per cent since Sunday.

two...the reversal of OPEC in their decision to reduce supply:

"The price gets above $60 and OPEC is a dove," said Subash Chandra, an analyst at Morgan Keegan & Co. in New York. "It gets below $60 and it gets hawkish, so the market's got a very good sense of where OPEC wants prices."

..no more we'd like to see oil at $40-50 a barrel

three...rumblings in brazil that oil supply should be limited as posted by alan yesterday
Exporting oil is "an act of treason," reckons Heitor Manoel Pereira, president of the Association of Petrobras Engineers, or AEPET, with 3,923 members in the active work force. "Brazil is no Saudi Arabia that can export as it wishes. This will reduce our possibility for development in coming years," Pereira said.

and finally... the economic blockbuster, first posted by geopoet, about china moving it's investments out of the dollar
"It is a subtle but clear signal that they are interested in moving away from the US dollar into other currencies, and are interested in setting up some kind of strategic commodity fund, maybe just for oil, but maybe for other commodities," he said.

...individually, they are, IMHO, surprising events, but taken together , in such a short time scale...i think they represent a substantial shift in worldview.
comments?

Surprising events? Surely not.

Ukraine wants to be 'western' and turn its back on Russia, then let them pay western market prices.

OPEC would mostly rather make as much money by pumping less rather than (in Saudi's case) be exposed as not having swing production capacity.

Brasil is a growing and developing nation, it does not wish to sell itself into greater future poverty for short term profiteering. Treason seems a most appropriate term.

China has more $ denominated assets than it is comfortable with (given it will pull the rug from under the $ someday), diversification mandatory - and has been happening softly for near a year.

They are but the initial ripples of 'end of empire'. The US one has been brief, a mere 100 years, but those who watch such things know it is coming within 20 years. Perhaps you are seeing these events as 'at odds' with the current system - as determined by the USA - and should rather see them as rational symptoms of the ending of that system and US hegemony.

Past such events have always been bloody and the odds strongly favour that this time too. We will have to grow up very fast to avoid it, the signs bode ill.


I might have been one of those suggesting that a guassian fit would be better (based in part on what Deffeyes has said), but this all gets me to thinking.

What are we really learning by doing this?  There will always be some noise in the data for various reasons that this type of curve fitting will never tell us.  There could be an economic downturn that could suppress demand.  The hurricanes last year would have caused a bit of a blip.  A terrorist attack (for that matter the insurgents in Iraq are effectively cutting off Iraq's oil supply).

So at the end of the day, do we get a more accurate guess as to when the peak will be, or the depletion rates we will see in the future?

I was thinking about this too ericy...and I think it's getting a handle on the fundamentals of the story in a manner that gives the community more evidence for its arsenal.  It's kind of like watching sausage get made as we watch all of these really smart folks hash it out, but at the end of the day, if we can get a better handle on the curve, we can move from that curve to start making more accurate predictions...

Still, your point about exogenous events is a good one.  Exogenous events are (cough trite alert) tough to predict, but at least we have a good idea of what exogenous events are possible and their probability of actually occurring, save the actual reserve numbers.  

Makes me start thinking Bayes...you know?

Re: "at least we have a good idea of what exogenous events are possible and their probability of actually occurring..."

This is news to me. If we did...

Re: "Makes me start thinking Bayes...you know?"

then I'd be thinking Bayes too. Wanna expand on this a bit, PG?
One can assign a more probable/less probable value to events based on assessments however often you wish.  I am not saying you'd be 'right' but that's not the goal...for instance, we know that the likelihood of a terrorist attack on US soil has gone up over the last twenty years, don't we?  we know that the likelihood of a run against Iran has increased over the last six months, right?  all of those variables, put together, could give us a probability of a "shock" to the system, which we could then vary based on that information...does that make sense?

in better words, with Bayes, you assign prior probabilities to events you wish to control for...they are still guesses, but they can at least be informed guesses...

I am still learning it, to be honest.  I've played with it a bit in some of my professional work, but I am by no means an expert.  

check this out: (g-d I love wiki)

http://en.wikipedia.org/wiki/Bayes%27_theorem

Ah, that important reality would be so amenable.

Unfortunately it is the relatively low probability events, very hard to predict rationally in advance, that mostly shape the critical turning points of our 'machine'. How would the world have turned had an archduke not been shot in Sarajevo and a different event triggered WWI a year or so later or earlier? The map of Europe would likely have been somewhat different today, as might subsequent history.

The best we can probably do is know the critical times, when things are in delicate balance or imbalance and relatively small events might overturn nearly everything.  Sometimes the balance can be restored, sometimes not and events take their own course, beyond reason, modelling and human control. Yes, we can guess, speculate, model, predict rationally or less rationally at such times but, if we are honest, we would admit that we are really doing so as a 'comfort', knowing that we are occupying time while reality crystallizes into its new form.

I know, you probably know, we have entered such a critical time. Something relatively unexpected could happen today, tomorrow, in a year or so's time, and reality as it has been for 40 or more years may be gone, the rules changed, the challenges new. We also know this is very likely to be a big one. And we know the chances are getting significant and growing.

Sharon's stroke has changed future probabilities somewhat, at first look it seems to the short term 'safer' side. But there are other potential events in the next few months which may trump that. Sometimes I wish I knew but other times my rational side dominates and I am just afraid (not irrationally so). Fortunately or not my 'irrational' side is usually more likely to be right, and so I continue to listen to it.

So, back to the subject. There is nothing wrong with using Bayesian techniques to evaluate probabilities of potential events, I would say do it with all fervour. But do remember that amongst the many events with less than 10% probability lurk a big handful which would change everything at the 'right' moment. Perhaps some way of modelling the sum of them by year might result in the most useful analysis.

you have me there, agric...but can't we still move towards our goal of refining our models to gain explanatory power as we can over time (through data mining AND causal analysis)?  in other words, incorporating increases in probabilities in certain explanatory factors, while discounting others, based on our best guesses to explain y?

if I become 95% sure that a 5% probability event of an 8 magnitude (play numbers, but bear with me) will occur in the next five years, and that has changed from a 75% certainty of 10% of a 9...and I have some theoretical expectation of the things that certain event will affect, have I not gained explanatory power on my dependent variable, even if it is a latent variable such as the probability of oil being at $100?

Of course, PG, and we should: everything we can do to perceive, analyse and understand should be done while we are waiting for reality to unfold. That way we are most likely to quickly grasp its unfolding.

I 'waste' some of my time trying to 'see' the possible futures, trying to work out when critical points might be, guessing at what they are, their causes and consequences, sometimes attempting to influence now to affect then. Objectively I would call that mad, but evidence seems to suggest some validity; truly mad, LOL.

I think it may be difficult to use Bayesian methods to model (in advance) what is really critical. My understanding is that it is hard to apply to a large group of low probability events in an effective way, but I would be very interested if you can show otherwise - then we might try to produce a list of potential events and assign probabilities.

How can we use Bayesian methods to model n1 to n99+ events with probabilities p1 to p99+ (where pn is < 0.1) such that we can say p[all] is >= 0.95?

For now I would say that, as a rough approximation, the odds of a massively disruptive event in 2006 is about 30% which will increase by about 50% of the prior year, year on year. If by 1st Jan 2010 things are mostly as they are today I will be completely astonished (I have never been completely astonished in my 51 years of life).

I would make the case that we're in a global chaotic system approaching a bifurcation point.  Lots of cultural, political, and economic energy being dissipated and not necessarily towards a productive end.  A small event could completely change the state of the system.  The uncertainty of forecasts may in fact be an attribute of the chaotic nature of the system, e.g. like the weather: no way to predict when the change will occur, what will drive the change, or to which new state the system may evolve, other than in very limited ways.
Re: "There could be an economic downturn that could suppress demand."

Right. Or production. Westexas brought up a linearization for Russia. So, I looked around a bit. EIA production data starts in 1991--that's not too surprising. The BP data starts in 1985 and thinking of PG's exogenous events (there's a euphemism if I ever heard one!) -- here's Russian production.


Click to enlarge
I too think there's more than economics at play in the decline after 1988.  In 1988/89 the winds of political change were blowing pretty hard in the Soviet Union. In response, the economy was changing from a centralised model towards a market model. It took about a decade for that change to work its way through the production systems. This curve shows 'demand destruction' for a decade, one premise being that the cause was initial political instability.  
The problem with these models, whether Gaussian or logistic (yeast curve), is that there seem to be no theoretical grounds for why they work. At some level, sure, it makes sense that production for a field would start at zero, climb to some peak, and then fall off to zero. And both these curves have this property, as many other curves do. The remarkable closeness of fit of the Gaussian, and the lesser closeness of the logistic, must be more than coincidence. But it is hard to see why they work.

The biggest mystery to me is this: why the symmetry? Why the heck is the down side a mirror of the up side? I don't see a reason in the world why that would be true, in everything I can think of suggests that it should not be.

During the growth phase, production is limited at first by the costs of new investment and by alternative opportunities for investment capital. As the field develops, production growth begins to slow down. The field is approaching "maturity" and the owners are not investing that much more into it. Maybe it is saturated in terms of reasonable places to put in new wells, or at least the cost of adding more equipment won't be paid back in the lifetime of the field.

Eventually production peaks, which seems to be largely a physical limitation. You just can't suck oil out faster at a reasonable cost. (It's worth noting that this may not be the  reason, it may be that you could suck oil out faster, but the cost of adding more equipment to do this would not be paid back in the relatively short remaining lifetime of the field - in that case, the owners in effect decide to let the field peak in order to maximize their profits.)

And then we're on the decline, which now seems to be purely physical. We're not adding or removing wells, but the oil is getting harder and harder to suck out. Every year we get less.

So here is the mystery again: the decline seems to be primarily a physical process based on the reluctance of oil to be pulled out of the rock. But the growth phase seems to be largely economics-based. The rate of production growth is limited by economic decisions about how much to invest in the field at each point in its lifetime. I don't see why these two phases should mirror each other.

In terms of Stuart's graph above, this translates into why the slope of the fitted line is constant as it crosses the horizontal axis. Why does the decrease in production growth rate (a confusing concept, the third derivative of oil remaining!) remain the same post-peak as pre-peak?

One problem is that only a small portion of the line is below the axis. The U.S. is only slightly past its peak when we look at the whole history. It would be interesting to apply the analysis to a single field, one that peaked long ago, to see how well the right side of the production curve mirrored the left side.

well, the symmetry isn't guaranteed, but it is the most likely distribution of a series of stochastic measurements (the same logic as a sampling distribution and why it works, for example)...if there are biases against that stochasticity (my "exogenous events" that seem to be the theme of the day for example...) then the distribution will change accordingly.

Either way, the area under the curve is finite...but that's also why the reserve numbers are such a big deal.  In the US we have a good idea of how much petroleum we have left, so this all works quite well...we're just trying to fit this, so that we can generalize to other countries where we have less complete information.

Models don't always have to reflect the underlying mechanics. There exist many models in science and industry which essentially look at empirical data and notice certain patters. For instance, Google the term 'experience curve' which relates the cost of a manufactured product to the cumulative mass production of the product. Such empirical models can be useful to a certain extent. The nagging question of emperical models is that one is never sure if the situation of interest will become the exception to the model or not.

Models derived from the mechanics/physics of the phenomena of interest might be percieved as more legitimate, yet they are only as good as the number of factors they take into account. In physical models the nagging question is whether some mechanism was overlooked and not included in the model.

Another interesting problem is when a model can correctly predict outcomes, yet is still wrong. Take Maxwell's equations on light propagation. Here the equations were correct, yet the Maxwell's concept of light propagating through an either was wrong.

With regards to oil, if it can be shown that most oil fields follow a similar pattern (barring govenment collapse or war) then that should be convincing on its own. One need only assemble data on many oil fields.