The Derivation of "Logistic-shaped" Discovery

This is a guest post from WebHubbleTelescope. The post addresses the origins and relevance (or lack thereof) of the logistic equation as it is commonly used in projecting/modeling oil production forecasts. As far as I can see, this is the first time anyone has succeeded in deriving the Logistic oil model from first principles. I will follow this with a post on the Maximum Power Principle next week, which in my opinion may shed light on the logistic curve from the perspective of oil 'demand' (as opposed to supply).

Many people believe that the Logistic equation adequately models the Hubbert peak. This comes about for a few reasons:

  1. We can (often/occasionally) get an adequate heuristic fit to the shape of the production data by matching it to a logistic sigmoid curve.
  2. The logistic-growth formula dU/dt = U(U0-U) carries some sort of physical significance.
  3. The logistic has hung around for a long time, in modern terms, therefore it must have some practical value.

I see nothing wrong with the first reason; scientists and analysts have used heuristic curves to fit to empirical data for years and a simple expression provides a convenient shorthand for describing the shape of a data set.  In the case of the Hubbert peak, we get the familiar S-function for cumulative production, and a bell-shaped curve for yearly production -- both characteristics that describe the Hubbert peak quite nicely to first-order.


As for point #2, we usually see hand-wavy arguments that point to an exponential growth that causes the peak oil curve to rapidly increase and then levels off as a negative feedback term in the equation takes over. What I consider circular reasoning with respect to Hubbert Linearization supports the idea that a physical process must drive this effect -- perhaps something similar to the constrained growth arguments popularized by Verhulst:

Verhulst showed in 1846 that forces which tend to prevent a population growth grow in proportion to the ratio of the excess population to the total population. The non-linear differential equation describing the growth of a biological population which he deduced and studied is now named after him.

Unfortunately, I have never seen a derivation of this idea to oil production, at least to my liking. Most proofs have simply asserted that the relationship fits our intuition and then the equation gets solved with the resulting sigmoid curve (here or here ):
U(t) = 1 / (1/U0 + 1/AeBt)
I have problems with these kinds of assertions for a number of reasons. First of all, the general form of the resulting expression above can result from all sorts of fundamental principles besides the non-linear differential equation that Verhulst first theorized. For one, Fermi-Dirac statistics show the exact same S-curve relation as described by the U(t) formula above, yet no respectable physicist would ever derive FD by using the dU/dt = U(U0-U) logistics-growth formula. Most physicists would simply look at the relationship and see a coincidental mathematical identity that doesn't help their understanding one iota.

Secondly, one can play the same kind of identity games with the Normal (gaussian) curve, which also gets used occasionally to describe the production peak. In the case of the gaussian, we can generate a similar non-linear differential equation dG/dt ~ -t*G which also "describes" the curve. But this similarly says nothing about how the gaussian comes about (the central limit theorem and the law of large numbers), instead it only shows how a mathematical identity arises from its parameterized curvature.  This becomes a tautology, driven more by circular reasoning than anything else.

The last point of the logistic having implicit practical value has the historical force of momentum. This may seem blasphemous, but just because Hubbert first used this formulation years ago, doesn't make it de facto correct. He may have used the formula because of its convenience and mathematical properties more than anything else. I have either tried to contradict the use of the Logistic or searched for a fundamental derivation for some time now, but since everyone has shown some degree of satisfaction with the logistic, I haven't had much success until now ...

The breakthrough I have come across uses the Dispersive Discovery model as motivation. This model doesn't predict production but I figure that since production arises from the original discovery profile according to the Shock Model, this should at least generate a first-principles understanding.

In its general form, keeping search growth constant, the dispersive part of the discovery model produces a cumulative function that looks like this:
D(x) = x * (1-exp(-k/x))
The instantaneous curve generated by the derivative looks like
dD(x)/dx = c * (1-exp(-k/x)*(1+k/x))
Adding a growth term for x and we can get a family of curves for the derivative: I generated this set of curves simply by applying growth terms of various powers, such as quadratic, cubic, etc, to replace x. No bones about it, I could have just as easily applied a positive exponential growth term here, and the characteristic peaked curve would result, with the strength of the peak directly related to the acceleration of the exponential growth. I noted that in an earlier post:
As for as other criticisms, I suppose one could question the actual relevance of a power-law growth as a driving function. In fact the formulation described here supports other growth laws, including monotonically increasing exponential growth.
Overall, the curves have some similarity to the Logistic sigmoid curve and its derivative, traditionally used to model the Hubbert peak. Yet it doesn't match the sigmoid because the equations obviously don't match -- not surprising since my model differs in its details from the Logistic heuristics. However, and it starts to get really interesting now, I can add another level of dispersion to my model and see what happens to the result.

I originally intended for the dispersion to only apply to the variable search rates occurring over different geographic areas of the world. But I hinted that we could extend it to other stochastic variables:
We have much greater uncertainties in the stochastic variables in the oil discovery problem, ranging from the uncertainty in the spread of search volumes to the spread in the amount of people/corporations involved in the search itself.
So I originally started with a spread in search rates given as an uncertainty in the searched volume swept, and locked down the total volume as the constant k=L0. Look at the following graph, which show several parts of the integration, and you can see that the uncertainties only reflect in the growth rates and not in the sub-volumes, which shows up as a clamped-asymptote below the cumulative asymptote: I figured that adding uncertainty to this term would make the result more messy than I would like to see at this expository level. But in retrospect, I should have taken the extra step as it does give a very surprising result. That extra step involves a simple integration of the constant k=L0 term as a stochastic variable over a damped exponential probability density function (PDF) given by p(L)=exp(-L/L0)/L0. This adds stochastic uncertainty to the total volume searched, or more precisely, uncertainty to the fixed sub-volumes searched, that when aggregated provide the total volume.

The following math derivation I extended from the original dispersive discovery equation explained in my TOD post "Finding Needles in a Haystack" (read this post if you need motivation for the general derivation). The first set of equations derives the original dispersive discovery which includes uncertainty in the search depth, while the second set of equations adds dispersion in the volume while building from the previous derivation.
In the next to last relation, the addition of the second dispersion term turns into a trivial analytical integration from L=0 to L=infinity. The result becomes the simple relation in the last line. Depending on the type of search growth, we come up with various kinds of cumulative discovery curves.

Note that the exponential term from the original dispersive discovery function disappears. This occurs because of dimensional analysis: the dispersed rate stochastic variable in the denominator has an exponential PDF and the dispersed volume in the numerator has an exponential PDF; these essentially cancel each other after each gets integrated over the stochastic range. In any case, the simple relationship that this gives, when inserted with an exponential growth term such as A*eB*t, results in what looks exactly like the logistic sigmoid function:
That essentially describes the complete derivation of a discovery logistic curve in terms of exponential growth and dispersed parameters. By adding an additional stochastic element to the Dispersive Discovery model, the logistic has now transformed from a cheap heuristic into a model result. The fact that it builds on the first-principles of the Dispersive Discovery model gives us a deeper understanding of its origins. So whenever we see the logistic sigmoid used in a fit of the Hubbert curve we know that several preconditions  must exist:
  1. It models a discovery profile.
  2. The search rates are dispersed via an exponential PDF
  3. The searched volume is dispersed via an exponential PDF
  4. The growth rate follows a positive exponential.
This finding now precludes other meaningless explanations for the Logistic curve's origin, including birth-death models, predator-prey models, and other ad-hoc carrying capacity derivations that other fields of scientific study have traditionally incorporated into their temporal dynamics theory. None of that matters, as the Logistic -- in terms of oil discovery -- simply models the stochastic effects of randomly searching an uncertain volume given an exponentially increasing average search rate. As an aside, you have to remember that Verhulst did not have the benefit of modern probability theory and the use of stochastic processes in the early 1800's, and came up with a very deterministic view of his subject matter.  As a matter of fact, the theory and application of stochastic processes only became popularized to Western audiences in the mid 20th century (with classical English books on the subject by Feller and Doob appearing in the 1950's) and for someone like Hubbert to make the connection would in retrospect have seemed very prescient on his part.

In the end, intuitive understanding plays an important role in setting up the initial premise, and the math has served as a formal verification of my understanding. You have to shoot holes in the probability theory to counter the argument, which any good debunking needs to do. As a very intriguing corollary to this finding, the fact that we can use a Logistic to model discovery means that we cannot use only a Logistic to model production. I have no qualms with this turn of events as production comes about as a result of applying the Oil Shock model to discoveries, and this essentially shifts the discovery curve to the right in the timeline while maintaining most of its basic shape.  In spite of such a surprising model reduction to the sigmoid, we can continue to use the Dispersive Discovery in its more general form to understand a variety of parametric growth models, which means that we should remember that the Logistic manifests itself from a specific instantiation of dispersive discovery. But this specific derivation might just close the book on why the Logistic works at all. It also supports the unification between the Shock Model and the Logistic Model that Khebab has investigated last year.

A different question to ask: Does the exponential-growth double dispersive discovery curve (the "logistic") work better than the power-law variation? Interesting that the power law discovery curve does not linearize in the manner of Hubbert Linearization. Instead it generates the following quasi-linearization, where n is the power in the power-law curve:
dU/dt / U = n/t * (1 - U/URR)
Note that the hyperbolic factor (leading 1/t term) creates a spike near the U=0 origin, quite in keeping with many of the empirical HL observations of oil production. I don't think anyone has effectively explained the hyperbolic divergence typically observed. Although not intended as a fit to the data, the following figure shows how power discovery modulates the linear curve to potentially provide a more realistic fit to the data. It also reinforces my conjecture that these mathematical identities add very little intuitive value to the derivation of the models -- they simply represent tautological equivalences to the fundamental equations.




As another corollary, given the result:
D(x) = 1/(1/L0 + 1/x)
we can verify another type of Hubbert Linearization. Consider that the parameter x describes a constant growth situation. If we can plot cumulative discovered volume (D) against cumulative discoveries or depth (x), we should confirm the creaming curve heuristic. In other words, the factor L should remain invariant allowing us to linear regress a good estimate of ultimate volume :
L0 = 1/(1/D - 1/x)
It looks like this might arguably fit some curves better than previously shown.


References

  1. http://mobjectivist.blogspot.com
  2. Finding Needles in a Haystack 
  3. Application of the Dispersive Discovery Model
  4. The Shock Model (A Review) : Part I
  5. The Shock Model : Part II

As a very intriguing corollary to this finding, the fact that we can use a Logistic to model discovery means that we cannot use only a Logistic to model production.

Given the vast number of variables that we have to deal with, I have tried to go with the simplest quantitative modeling tool that appears to provide some plausible results. I think that a good way to evaluate the HL method is to generate some predicted production curves for regions that have peaked, using only production data through the peak date to generate the predicted curve--and then compare the predicted post-peak cumulative production to the actual post-peak cumulative production for a given region. This is what we (my idea, Khebab's hard work) did in the following article:

http://graphoilogy.blogspot.com/2007/06/in-defense-of-hubbert-linearizat...
In Defense of the Hubbert Linearization Method (June, 2007)

BTW, I should add a fairly self-evident point, to-wit, that the HL method can't "see" the production from immature and/or undeveloped basins. Of course, the problem is that there are fewer and fewer basins that are in this category, and then the question is how material they will be to a given region and to the world.

I think the procedure you referene in the other post is really providing a false sense of accuracy -- at least there are some implicit assumptions that are never explicitly addressed. The biggest influence on your outcome is:

How do you choose the points to which the model is fitted? In the post above, you say "...using only prodcution data through he peak data to generate the predicted curve"; however, in the link you clearly state only the "green" points are used to fit the model. Clearly, there are several points prior to the "green" points that are not included in the modeling process. My question remains:

1. How do you choose the green points?
2. How much does the answer vary if choose a different range of green points?
3. A true estimate of the variance of the curve could be gleaned if you randomly chose x points prior to the peak, doing this several hundred times and getting an empirical confidence interval. Have you done this?

While I am a statistician, I have worked extensively with physicists -- which it appears that OP is, also the love for the power law gives it away a little :) The point is that the model needs to be chosen based on a defendable reason versus quantitative convenience. In my opinion, this becomes more of a necessity as Peak Oil becomes more "mainstream" and people begin to investigate some of the claims. It becomes fairly easy to establish numerous counterexamples where the HL procedure is shown to be quite ineffective or exhibits a lack of robustness.

I'm actually a physicist, and I agree with your requirement of defensible reason being more important than quantitative, but there is, as a physicist, some wriggle room. It depends on the quality of the data and the stage of development of the theory. For example, in Verhulst's time, there was no data with which one could do a reliable study of the effect of starvation alone, as opposed to starvation and disease, or starvation and war, etc. So Verhulst chose to simply posit that population tended toward a saturation number that was a new parameter in the Malthus model. In the absense of any real data, this was little more than an intellectual place holder for the idea that this model can't possibly be complete.

Then a century later Hubbert needs a simple formula for a time dependent quantity that starts very small, grows to a peak and then declines, ultimately to zero. He sees that Verhulst equation meets his criteria and uses it.

The Gauss normal curve also meets these theoretical criteria. I tried Gauss curve when first learning about PO. It leads to very messy algebra. There was no theory that supported using Gauss in this situation. The central limit theorem applies to large numbers of statistically independent events. Since I didn't want to use Gauss because the algebra was messy, it was easy for me to convince myself that there were surely not a large number of independent events in this situation.

He, like a physicist, doesn't have to justify trying it. Using it only needs justification if it works, and then the justification is more a discussion of what work needs to be done to develop a proper theory. Among other things, one needs to develop a good procedure for selecting data.

Elsewhere in the discussion I've posted a comment about how troubling it is that Hubbert linearization requires that the Hubbert peak be symmetric.

I think that we may very well witness the post peak decline in real life before we have a adequate theory of how to predict it. What we have now is good enough for economic hand waving, but as soon as decline is real there will be rapid changes that will lead to big forced changes in human behavior.

The Gauss Normal curve only kicks in when you apply the Shock Model to the discovery curve. The shock model places convolutions of slight production shifts corresponding to the fallow, construction, maturation, and extraction phases after the initial discovery (i.e. a sequence of statistically independent events). This trends the production curve to look more Gaussian.

You should look at this post http://mobjectivist.blogspot.com/2008/03/street-lamp-understanding-of-sh... to see how this all works in the context of the Oil Shock model. Convolutions of gaussians result in gaussians and all curves trend toward this property as a consequence of the CLT:

The only minor issue I have is this statement of yours:
"He, like a physicist, doesn't have to justify trying it. Using it only needs justification if it works, and then the justification is more a discussion of what work needs to be done to develop a proper theory. Among other things, one needs to develop a good procedure for selecting data."
Without the theory, this becomes the definition of a heuristic and it prevents us from making as fast a headway as possible. Can you imagine how slowly we would have advanced technologically if everything was based on heuristics instead of fundamental explainable laws such as Maxwell-Boltzmann and Fermi-Dirac statistics? If it wasn't for F-D in particular, we would still be wondering why a semiconductor transistor works at all!!

Otherwise I agree with everything you say and consider Verhulst's approach a deterministic trajectory and not the stochastic trajectory that we really should be using, ala the Dispersive Discovery model.

The key point is to determine if the data set shows a steady linear progression with a P/Q intercept generally, but not always, the 5% to 10% range (there are some outliers, such as the North Sea, an exclusively offshore region with a rapid decline rate). That is how Khebab chose the green points.

We don't have that many large producing regions to study. We can say that our available case histories--Texas (total plot, pre-peak is noisy), the total Lower 48, Russia, Mexico, North Sea, Saudi Arabia etc.--broadly fit the HL model. These regions account for about half of the oil that has been produced to date worldwide.

Meanwhile, what I first warned about in January, 2006--based on a HL analysis of the top net oil exporters--is unfolding in front of our very eyes, an accelerating decline in net oil exports. In fact, based on the HL analysis of Russia, in January, 2006, I gave Russia another one to two years of rising production before they resumed their production decline, and while Saudi Arabia has shown a rebound in production, it is a near certainty that they will show three straight years of annual production below their 2005 annual rate, at about the same stage of depletion at which the prior swing producer, Texas, started declining (all based on HL).

Recent headline:

Declining Russian Oil Production Could Lead to $200 Oil and “Global Recession,” Says Deutsche Bank

You are exactly correct in your observations and the choice of the time interval for the fit is the weakness of this empirical approach.

A true estimate of the variance of the curve could be gleaned if you randomly chose x points prior to the peak, doing this several hundred times and getting an empirical confidence interval. Have you done this?

Yes, but I use a Bootstrap techniques instead in order to derive a confidence of interval which is very often quite large. For instance for Saudi Arabia:

Thanks for the response, Khebab -- this is basically what I was asking. Do points prior to 50 Gb ever get used in the estimation. I guess I am just curious to how long you have to wait to choose a "linear" portion of the profile. Also, how effective is this if the peak hasn't occurred yet?

For instance, what if you started estimating with the first two observations and then updating your estimates as new observations came in. You would get wildly varing answers. Then, eventually, you would have to decide where to quit using early points to capture the linear part of the profile. In the graph above, if the estimation was done using the points between 40 Gb and 70 Gb, we would have estimated the cumulative Gb to be around 85.

In the end, I think this is a decent way to model existing data, but may be poor in making predictions of any accuracy. I just think we have to be careful on how this is presented -- especially when addressing scientifically-minded non-believers.

"Do points prior to 50 Gb ever get used in the estimation"

Look at this mash-up of Khebab's SA data and the USA data from my post above. The data points show more fluctuation for SA below 50 Gb, but they both show that curious hyperbolic curvature indicated by the solid blue line.

As I said in the post, this has to do with the use of power-law discovery as opposed to the exponential-law; the latter gives a perfectly flat HL.

I usually discard the first points because low cumulative production values (Q) will boost small fluctuations in production (P). I usually take P/Q<10% as a cut-off value. Because of integration, the noise on P does not affect Q after a while and fluctuations in P are dampened as Q increases.

Your reddit link:

http://www.reddit.com/info/6p369/comments/ (science)

I'm glad to see that you are still active on that problem.

If I understand correctly you have improved your Dispersive model using a Bayesian approach where the prior on L is specified. You then derive a generalization of the logistic curve (s-curve).

I'm having troubles following changes in your notations, last time you gave:

D(t) = kt6*(1-exp(-Dd/kt6))

It would be helpful to provide a short table linking your model variables to real world quantities:
L0: average depth?
k= Dd: URR
x: the current search depth
etc.

The dispersive model is supposed to model the discovery curve + reserve growth. The way I see it, true reserve growth (i.e. free of political/economical influences and accounting artifacts) is:
1. improved recovery methods applied on fields over time
2. knowledge growth: knowledge of the fields increases with time (i.e. field delineation)
Can you explain what is your interpretation of reserve growth? It seems that in your model, reserve growth comes only from the increase in the search depth which is in fact new smaller discoveries at greater depth over time.

One issue is how to estimate the various parameters from real world datasets. In particular, discovery data is contaminated by backdated reserve growth that is difficult to remove without complete reserve growth history. Reserve growth should be a time-dependent diffusion of the initial discovery volumes. A quick and dirty solution is to first remove reserve growth using a heuristic reserve growth model (e.g. Arrington) and then convolve with the same time dependent reserve growth function which in the case of the Lower-48 gives:

The red curve is now the correct curve where reserve growth is dispersed in time and not instantaneous. IMO, the fit between your model and this new curve is quite remarkable:

Any chance you could graph this method for world production?

Here is a quick trial using the ASPO discovery curve and Russia reserve growth model (Verma et al.):

Here is the fit using the new dispersive model:

Khebab,

I'm a college student and while I am going into Calculus 3 next semester I find this stuff exceedingly difficult to understand, what books, classes or websites might I look at to better understand this statistical modeling. It's tricky :D

Thanks,
Crews

The absolute classic and considered one of the great mathematical texts of the last century is "An Introduction to Probability Theory and Its Applications" by William Feller.

If you want to really get hooked on understanding how the physical world works, I would suggest taking a course on Statistical Mechanics.

Khebab

In my opinion the original discoveries plus a small fraction of the reserve growth is probably a good estimate for the amount of easy to extract oil left.

Just eyeballing the graph to integrate I get.

About 1.7 trillion barrels total.
Original = 1 trillion.

Assume a 20% growth in reserves is easy oil.

1.7*.20 = 340

This gives about 1340 billion barrels of "easy oil".

Lets give this rough estimate a 10% error term or range and its
1206 - 1340.

Given we are I think close to 1100 GB extracted now.

Then we could be at about 80-90% depleted in "easy oil".

This simply little calculation should be enough to make you wonder if we are going to keep production close to the highest levels ever achieved for much longer. Even if you increase the easy oil levels its not hard to see that production rates will probably begin to fall off soon.

The fact that the easy vs hard concept predicts a peak at around 70-78% of total URR inline with WT 60% of URR peak prediction is interesting. The two different approaches are not giving hugely different answers. In fact in my opinion the logistic is telling us a lot about how much easy oil we have left. In fact this easy oil approach does not tell us a lot about peak itself just when decline is certain peak could have been back at 1000 which given a 1700 total is 60% of URR perfectly in line with the logistic.

What the easy oil approach says is that decline is probably certain by 70% of URR or 10% past what was probably peak production.

Time will tell of course but I don't hold high hopes for us getting the next 700GB or 1000GB or whatever number we claim to still have in reserves out of the ground at anywhere near the rate we extracted the first half.

First of all, as everyone realizes this blogspot and scoop blogger software is crap for doing any kind of mathematical markup. Therefore the equations become more ad hoc than I would like, and I resort to using snapshot gifs of markup from various technical equation processing SW apps.

So yes, L0 and Dd both refer to URR give or take a scale factor to convert from some earth volume to cumulative number of barrels. Everything here is dimensionally sound and the time advance of discovery volume follows the URR linearly.

Now here is how the reserve growth comes in. The dispersion in discovery rates gives the source of the reserve growth (not the dispersion in volumes necessarily). The high rates over certain parts of the volume provide the initial fast growth and the slower rates over other parts of the volume give the long tails in the out years. The whole set of rates accelerates over time in terms of mean, but the dispersion stays as the variance of the mean so the slow rates are always there (think dispersion of wavelets, Khebab). I think it is pretty obvious, but no one really understands how to backdate all the discovery curves to reflect this property properly as you indicate. On top of this, the power-law growth rates give much higher reserve growth than the exponential law growth, since the power-law family is stronger initially but weakens in comparison to the exponential as time increases. This turns the strong symmetry in the exponential dispersive/Logistic into the asymmetry of the power-law dispersive. The worst (or best in terms in terms of reserve growth) is fractional power-law growth; this is a diffusion-limited growth that gives incredibly long reserve growth tails. And the diffusion is what you want to see -- my feeling is that the dispersive effects on top of strong technological acceleration outweigh the diffusional aspects on any one particular reservoir. In other words, the statistics rule on an aggregate of reservoirs and the "micro"-diffusion likely applies better to individual reservoirs.

I agree totally with your Arrington approach, but wish we did not have to do this, and suggest that someone place the reserve growth discoveries in the correct places on the timeline.

My problem is how to retrieve the reserve growth function from the available data itself and without access to complete reserve growth history.

Assuming that you have a complete discovery curve for a particular country (e.g. Lower 48) contaminated by backdated reserve growth, I was thinking about the following approach:
1. choose a suitable parametric form for the reserve growth factor function (RGF): RGF(t)= at^b
2. choose values for a and b.
3. remove backdated reserve growth from the original discovery curve.
4. simulate a reserve growth history from the RGF function and the new discovery curve in 3.
5. add simulated reserve growth history to the discovery curve in 3.
5. fit the dispersive model on the new discovery curve.
6. Apply the shock model
7. compare the reserve history generated by the Shock model and available proven reserve history (after anomalous increases removed)
8. go back to 2 and reiterate
9. the best agreement in step 7 gives the more likely parameter values for a and b.
This approach is similar to what I've tried to do with Ghawar (http://www.theoildrum.com/node/2945). I'm also wondering how the (k,n) values for the dispersive values would compare to the (a,b) values.

I think the fundamental distinction between the (k,n) tuple and the (a,b) tuple is that (a,b) always starts at the initial discovery point for a particular region, but (k,n) predates all those points. It is essentially the difference between comparing a(t-t0)^b and k(t-0)^n. The t0 point is based on the discovery time, but t=0 is the time from the start of the search. So the care we must apply is to get the t0 bias correct. For example, if we use a later t0 value for the (k,n) tuple, the reserve growth function will look concave up (2ndderivative positive) whereas we know that reserve growth from the point of the particular discovery is concave down.

Otherwise I think the same principles apply and dispersion looks like a kind of diffusion. The big question that needs to be answered is how strong this search rate is after the initial discovery. Dispersion is us looking for "the stuff", while real diffusion is "the stuff" creeping toward us.

Nate,

OK I skimmed this a bit because my eyes started to glaze over at the math, but if I may paraphrase:

The best fit up to now was one particular equation and now you've discovered that a better fit might be another kind of equation instead.

OK so if you use the newly identified equation to model production, what differences (if any) do we have for total reserves and more importantly, for depletion?

I too would love to hear the "bottom line" for this work.

Can someone succinctly (perhaps WebHubbleTelescope) say what this means when all is said and done? Depletion rates are going to be faster? Peak will be higher? etc.

-André

The bottom line is that this is the first time anyone has ever tried and succeeded to derive the Logistic oil model from first principles. If someone other than me or Khebab who seems to understand the math (i.e. to at least be able to reproduce the derivation) picks this up and applies it to some data sets, we can make some progress in enlightening the masses on how to understand the laws of constrained resources. As I have said elsewhere, this is the most basic bean-counting type of applied probability mathematics that I have ever encountered. Perhaps probability is in my blood, but I find it incredible that no one has ever approached it from this angle before. I am reminded of this saying : "It can't be right. If it is so simple, then why hasn't someone discovered this before?"

In my other area of expertise I have physics algorithms named after me, but I am just a hobbyist in the fossil fuel arena so it will be up to others to decide what to do with this stuff. IMO, that without having a great blog like http://TheOilDrum.com in place, math derivations like this would still be undiscovered or invisible to the masses. I think the whole oil industry is so incestuous and in-grown and cliquish that they have absolutely no avenues to innovative new ideas. This derivation of mine would get buried in no time. Just look at what the Bush administration does to rebellious scientists. Read this post:
http://www.huffingtonpost.com/carl-pope/let-them-hate-so-long-as_b_10440...

I have no doubt that the oil corporatocracy behaves the exact same way.

Thanks, WebHubbleTelescope.

It is indeed wonderful to have TOD to publish your work to a wide audience.

Let me say I have very little understanding of statistics beyond that of the undergraduate level. What I do have is a working knowledge of various methods to predict momentary and cumlative recoveries of product from various chemical processes. The main thing I found was that it was impossible to predict the ultimate recovery with accuracy from the first part of the curve. In fact it was difficult to predict beyound the next inflection point. This was using a known volume reactor. Considering the unknowns such as volume of oil on earth, technology advances and economics, it seemes that at best statistics can only give answers in very constrained scenarios.

Each variable added then produces a new dimension that lessens the certainity of our prediction. As we advance down the time line our uncertainity grows. A year from now we can be resonably certain, ten years we have a glimmer of the possibliities, thirty year from now fa-gid-about-it

I don't understand. Have you looked at the historical world yearly discovery curve? It actually peaked in the early 1960's. It has been on a long steady but noisy decline since. The cumulative discovery reinforces this. Hubbert essentially predicted this decline long ago, even before the peak, but until now I don't think we had a good probability & statistics based derivation of how this dynamic works and the fundamental understanding that comes with it.

It would be the same thing if you didn't understand stoichiometric chemical rate equations, but just applied it blindly based on heuristics. Clearly the way stoichiometry works is clearly understood, and you gain a lot of intuition based on this understanding. That said, oil discovery looks nothing like stoichiometric rate equations, and you lose your intuition instantly if you believe that.

For one, Fermi-Dirac statistics show the exact same S-curve relation as described by the U(t) formula above, yet no respectable physicist would ever derive FD by using the dU/dt = U(U0-U) logistics-growth formula. Most physicists would simply look at the relationship and see a coincidental mathematical identity that doesn't help their understanding one iota.

Coincidental?
Maxwell-Boltzmann, Einstein-Bose, and Fermi Dirac were derived from the kinetic theory of gases, an idealized
state, not from fitting empirical data to curves.

http://en.wikipedia.org/wiki/Kinetic_theory
http://en.wikipedia.org/wiki/Fermi-Dirac_statistics

I kinda like your fancy footwork here but...
(P(x)=1/lambda*exp(-x/lambda)..huh? But Sum of P(x)=1?)
and L is a random variable?

As you drill deeper (or wider?) you find less oil....more oil?

I think that Verhulst's 'r' being fixed is a bit of a stretch but there are practical limits to the range of any r(growth rate).
It's been around for a long time because it's a mathematical model just as the time for water draining from a tank is a mathematical model.

Could your model be a little more physical?

P(x)=1/lambda*exp(-x/lambda)

is a PDF so it's normal that Sum(P(x).dx)=1, the average must have some physical meaning E[x]=Sum(x.P(x).dx)= lambda. I think it is an average drilling depth ("confidence depth") increasing with time (lamda=k.t^n).

There is an oil window in the range from 7,000 to 15,000 feet where temperatures are hot enough to "crack" organic-rich sediments into oil molecules but not to hot so that we get natural gas. There should be a way to translate that fact into a proper pdf on L (a Gaussian maybe). There are some arguments discussed in a previous thread:

http://www.theoildrum.com/node/3287

In particular this chart from Hubbert is close to the Dispersive model:

Precisely, I call it a mean confidence depth. The exploration will proceed with an accelerating increase in search volume whether or not the prospectors realize they have exceeded the depth at which further discoveries will occur.

The constrained 7000' to 15000' window would suggest that dispersing the subvolumes is not as important as dispersing the rates, as the original dispersive discovery law argues. The Double Dispersive discovery model would place the mode (most common) closer to zero depth, a mean at (700+15000)/2=11000' and only 5% deeper than 30,000'. The Singly Dispersive model essentially cuts it off at 11,000. The reality is somewhere between the two models. Good observation.

A statement by the late L.F. Buz Ivanhoe from page 2 of the 2nd issue of the Hubbert Center Newsletter http://hubbert.mines.edu 97-1
" Hubbert wrote virtually nothing about details of the “decline side” of his Hubbert Curve, except to mention that the
ultimate shape of the decline side would depend upon the facts and not on any assumptions or formulae. The decline
side does not have to be symmetrical to the ascending side of the curve - it is just easier to draw it as such, but no rules
apply. The ascending curve depends on the skill/luck of the explorationists while the descending side may fall off more
rapidly due to the public’s acquired taste for petroleum products - or more slowly due to government controls to reduce
consumption."

The decline side does not have to be symmetrical to the ascending side of the curve - it is just easier to draw it as such, but no rules apply.

This is quite important to our understanding of the place of the logistic equation in PO: The logistic equation peak IS symmetric. There is no adjustable parameter in it that allows asymmetry. The linear portion of a Hubbert linearization comes from the declining side of the peak. So we must assume that production curve is symmetric, if we want to use HL. Show me where my reasoning is faulty, please.

No fault in your reasoning at all. The derivation above for dispersive/Logistic gives one parameter which is the virtual search growth rate function - an accelerating exponential in this case. (The dispersive factors in rate and volume essentially cancel each other out so they do not show up in the final formula.) If for some reason, the search rate abated as we near or pass the peak, the downslope would show longer tails as the search space takes longer to fully explore.