The Loglet Analysis

Most peakoilers on this site have been introduced to the logistic curve through the famous prediction of King Hubbert on the Lower-48 production. Fewer maybe knows that curve fitting techniques have been extensively applied by people that we may qualify as cornucopians. Ironically, the logistic curve is also used as a prediction tool for market share and technology subsitution. For instance, a pioneer in logistic-based technological forecasting in the energy domain is Cesare Marchetti:
More recently, there is also the work of Jesse H. Ausubel and his team about the Loglet analysis:
"Loglet analysis" refers to the decomposition of growth and diffusion into S-shaped logistic components, roughly analogous to wavelet analysis, popular for signal processing and compression. The term "loglet", coined at The Rockefeller University in 1994 joins "logistic" and "wavelet". Loglet analysis comprises two models: the first is the component logistic model, in which autonomous systems exhibit logistic growth. The second is the logistic substitution model, which models the effects of competitions within a market.
src: Perrin S. Meyer, Jason W. Yung and Jesse H. Ausubel, Logistic Growth and Substitution: The Mathematics of the Loglet Lab Software Package. Technological Forecasting and Social Change 61(3):247{271, 1999.
The Loglet analysis is interesting because it can potentially handle multi-peak production profiles which is a common challenge for curve fitting techniques. I won't go into too much details for the Loglet transform, all the mathematical details are well explained in the aforementioned reference (there is also a pdf version here). The Loglet decomposition is an elegant mathematical framework which consists in fitting a sum of logistic curves. The decomposition is performed using successive Fischer-Pry decompositions ( J.C. Fisher and R.H. Pry. A simple substitution model of technological change. Technological Forecasting and Social Change, 3:75-88, 1971.) which consists in plotting the log of the cumulative production as a fraction of reserves versus time:

log(Q(t) / (1 - Q(t) / URR))= -K(t - t_half)

An example of a Fischer-Pry representation is given on Fig. 1. The notations used for the logistic parameters differ from the notations usually encountered on TOD:
  • N(t) is the cumulative production (i.e. Q(t))
  • t_m is the peak date (i.e. t_half)
  • Dt is related to the logistic growth (K= log(81)/Dt) and corresponds to the time necessary to deplete between 10% and 90% of the total resource.


Fig. 1-
The logistic growth of a bacteria colony plotted using the Fisher-Pry transform (bottom) that renders the logistic linear (from Meyer et al.). Click To Enlarge.


Fortunately, an open source software (in Java) is freely available called Loglet Lab. The software is fairly simple to use and well documented (there is a tutorial here). I tried to apply different Loglet analysis on the world production with an increasing number of curves.

Number of LogletsURR (Gb)Peak datePeak Production (mbpd)RMS (Gb)
11408199071.361.1
21470200263.8104.5
31995200575.020.8
42020200778.220.0
52018201178.015.8
62094201183.113.3
72119201284.811.4
82125201283.912.4
Table I. Results on the world oil production (all liquids excluding refinery gains) for different number of loglets. The RMS (Root Mean Square Error) measures the quality of the fit.


The best fit was reached for a number of loglets equals to 7, the corresponding production profile is given on the figure below. Note how the different Loglets are concentrated around the different oil shocks.


Fig. 2-
Results of the Loglet analysis for 7 loglets applied to the world production (all liquids excluding refinery gains). The different loglets are the dotted red lines. Click To Enlarge.


The table below gives the parameter values of the different Loglets (the dataset is also available on EditGrid):

URR (Gb)1541.7 172.5 156.5 83.8 72.1 62 30.3
% of total URR72.8 8.1 7.4 4 3.4 2.9 1.4
Dt (years)57.9 26 71.8 15 18.7 77.2 141.9
K (%)7.6 16.9 6.1 29.3 23.5 5.7 3.1
Peak date2012.4 1972.3 1964.7 1975.9 1989.8 2010 2001.1


One Loglet dominates the production and contains nearly 73% of the total URR and is due to peak in 2012 (see Fig. 3). The rest of the contributions come from 6 Loglets and has peaked in 1975. I wonder if this component represents the early "easy oil" from the super-giant fields.


Fig. 3-
Same as Fig. 2 but only the Largest loglet is shown and the 6 others are merged in one contribution (~25% of the URR) . Click To Enlarge.


Compared to other curve fitting results (Fig. 4 below), the Loglets give a better result on the left side but is much more pessimistic on the production decline.


Fig. 4-
Different logistic-based predictions: Stuart staniford (in green), Double Hubbert Linearization (in magenta) and Loglets (in blue). The spreadsheet is available here Click To Enlarge.


In summary, the Loglet analysis gives promising results and could be applied on other difficult cases (e.g. Russia). With this approach, the oil shocks and the different production regimes are well modeled. The Hubbert Linearization technique could be used instead of the Fisher-Pry transform which I don't find very practical. The Loglet Lab software could also be used to model energy substitution scenarios (i.e. conventional oil replaced by synfuels, biofuels, etc.). I have noticed a few limitations in the software:
  • The Levenberg-Marquardt algorithm is used to perform the Loglet analysis and is dependent on the initialization.
  • There is no display of the resulting production curve, only the Loglets and the cumulative production are given.
  • Some statistics on the quality of the resulting fit are missing (e.g. RMS error).
Loglet, wavelet, whee. Whatever it takes to put you in George Jetson's car, whatto I have to do to get you to sign the papers today, listen to 'er purr.....
Can you explain in simple terms how to get from a loglet to a bell curve?
Well, the Loglet analysis is a component logistic model, you'll find a good explanation here.  The best way to understand it is to look at the bi-logistic model:

As it turns out, many growth and diffusion processes are actually made up of several subprocesses. First, let us consider the case of a system which experiences growth in two discrete growth phases. Then, we will extend this to an arbitrary number of phases.

Systems with two growth phases follow what we call the ``Bi-logistic'' model [12]. In this model, growth is the sum of two discrete ``wavelets'' , each of which is a three-parameter logistic.


Below is an example of a bi-logistic model:

The cumulative curve in panel A is the sum of the two logistic process in panel B (you then take the derivative to get the usual Hubbert/Bell curves).

The Loglet is an extension of the Bi-logistic case to a multi-logistic case.

S curves and bell curves

Maybe this requires a bit of a leap of faith. The idea of diminishing returns (more effort, not so much reward) lends itself to S curves. However not every S curve has a bell curve as a tangent.  It has to be just right. A similar example   is using a sheet of shiny metal as a reflector; it has to be an exact parabola to get a sharp focus.

An alternative theory is that the bell curve is really a triangle with fudgy corners. But I don't wanna go there. Given the uncertainties bell curves are OK.

Very nice.  I think this gives a more realistic description than the Hubbert curve.
Judging how realistic it is might better be done by running a few country curves through the method.  Hubbert linearization works pretty well on many of them that have passed peak.  Do loglets do as well?

Mark Folsom

The HL does pretty well on some but not not so well on others. I'm planning to test this method on other countries in a next post.
It's as realistic as any other extension of curves without a proof of the underlying laws of nature: not very realistic. Isn't it strange that there are lots of smaller loglets in the left (past) part of the curve, but not a single one in the future? The loglet analysis misses all the future small loglet curves that have not started yet: arctic oil, widespread CO2 injection etc.

If you do a country analysis, try and make a prediction for the North Sea or Britain with the data up to 1990, 1995, etc. and checck how the predicted values change.

So what effect does this have on your predicitions for post-peak decline rate?  I'm no mathemagician, but it looks like the slow squeeze maybe got a little less likely, huh?
First off, nice post Sam.

Second, the "slow squeeze" is Stuart's best view on the next decade or so based on his linearizations. Since his obviously useful Hubbert formulations are a good thing, we need only remember that these are a model of the way things work and not the law.

Fundamental irresolvable discrepancies exist between the "bottom-up" approach of Skrebowski (and CERA) and Stuart's analysis. But it's all short-term stuff. Stuart thinks the peak is likely now. Skrebowski thinks it's in 2010.

Who gives a damn? Big problem either way.

I think that it may not make much better predictions than Hubbert's mono-peak.

My criticism: we are making rough predictions based on partial information. While this method focuses enough precision on the logistic curves to distinguish individual fields (or wells?) in logistic history, it does not predict the placement or size of future bumps in the tail.

The original, monomodal method is a crude approximation to historic data, but also a crude prediction of undiscovered sources, basing its expectations on the historic end of the bell curve.

For example the historic data through, say, 1956 would probably not anticipate the bump from the recently trumpeted finds in the deep areas of the GoM. But in a hand-waving sort of way, Hubbert does this by expecting a tapering-off rather than a logistic cliff as known reserves are consumed.  

My criticism: we are making rough predictions based on partial information. While this method focuses enough precision on the logistic curves to distinguish individual fields (or wells?) in logistic history, it does not predict the placement or size of future bumps in the tail.


Always moving the future is.

I agree with you, the loglets on the left have taken away some of the area from the main loglet in order to model the different oil shocks. The result is that the main loglet has a steeper decline.
I'd like to add to that: better precision, and good on ya, Khebab.

This sort of analysis would be an excellent way of stating proven reserves, wouldn't it? If we don't do any more exploration or drilling, we have exactly this much in the ground right now. The multimodal analysis should give an excellent answer.

I would be interested in your thinking behind the idea the first curve is the largest fields. (I agree, but interested in hearing your logic).

First glance said to me "peak in early 70's, must be US". But the size is too large.

Well, I was thinking anout the discovery pattern of oil fields:


(data taken from Simmons's book)

The largest fields are also the oldest and the most accessible with high flow wells.

Given the first bump was politically driven - ie a manifestation of an external force, rather than a feature of nature or geology, I find it hard to accept that it can be explained as it being due to some natural decline in a subset of larger fields. It just doesn't make sense to me.
The 1973 and 1979 oil shocks were politically driven but not the prior strong production growth.
You left me completely confused... how does that address the problem? The decline in the 70's was politically driven, but you are claiming the decline was really due to the decline of some subset of the fields just because you can make a curve fit under the first bump. Seriously, that's not very convincing is it?
I think we are not understanding each other.

Stuart identified several time periods where the growth rate is stationary:

1891-1929         7.9%  
1929-1942         3.9%  
1942-1973         7.4%  
1973-1979         2.1%  
1979-1983         -4.0%  
1983-2004         1.5%  

Between each period, we have economic transitions, oil shocks,  wars, etc. I agree with you that the transitions in 1973 and 1979 were politically driven (OPEC embargo and the Iran crisis respectively) and not a natural decline of some subset of fields. However, prior to these two shocks, the period between 1942 and 1973 enjoyed an exceptional growth rate (7.4%) which was possible mainly because of a few giant oil fields discovered in the 40s-60s (see graph about discoveries above) that not necessarely required a mature oil infrastructure.

You could probably also make an arguement that the growth would have probably been (and indeed has been) fairly constant in the mid 7%'s if the oil had been absolutely effeciently extracted. And then argue the exceptions are due to the extreme situations in the thirties (the economic depression) and the late seventies/early eighties (embargo/politics in Iran) which "interfered" with effecient development. Then from the ninties onwards you start approaching the peak and so you might expect growth in production to slow. So the model is intact. Would that make sense?
URR (Gb)    1541.7     172.5     156.5     83.8     72.1     62     30.3
% of total URR    72.8     8.1     7.4     4     3.4     2.9     1.4
Dt (years)    57.9     26     71.8     15     18.7     77.2     141.9
K (%)    7.6     16.9     6.1     29.3     23.5     5.7     3.1
Peak date    2012.4     1972.3     1964.7     1975.9     1989.8     2010     2001.1

Wow.  There's something really tempting about this....
Let's squeeze the data until they scream.  :)

1st column:  2012 bulk of mature OPEC/mideast/venezuela
2nd column:  1972 US lower 48.
3rd column:  1964 rest of world, not normally oil producers???? (not clear here)
4th column:  1975 Saudi Aramco pre nationalization
5th column:  1989 USSR and Alaska?
6th column:  2010 technically advanced enhanced oil recovery offshore
7th column:  2001 North sea (the long Dt says otherwise)

I tried also that exercise but that's not easy to map all the transitions. Another approach is to exploit the loglet signatures in order to identify the different production regimes. By running k-means, I get 12 classes (or 11 transitions) between 1900 and 2005:

In red are the different production regimes identified by Stuart:

1891-1929         7.9%  
1929-1942         3.9%  
1942-1973         7.4%  
1973-1979         2.1%  
1979-1983         -4.0%  
1983-2004         1.5%  

What a fantastic fit.  It's also scary as hell, because it shows a 4% slope on the other side.
The magic of curve fitting is that if you use enough loglets, you can fit any data series. Still does not have any predictive value if you do not inject real experimental knowledge or geological data. Also remember that these loglets are mathematical constructs which may or may not represent countries or geological regions. Some of them will represent politics: OPEC oil shock etc.
Agreed, you may overfit your data. However, the algorithm is not converging well beyond 7 loglets (Table I).
How does your fit compare to a 7th order polynomial for the same data.  If you add enough terms you can fit anything.  Post peak I would suggest the you will find you need more little loglets to satisfy the wiggles that happen on the way down. I like the idea - but not the high order required for the fit.

 

LOL.

Actually, this is a whole lot prettier than a seventh order polynomial would have been. You can fit anything with a high enough order polynomial, but such a fit to sparse noisy data is apt to be analytically useless. Usually it oscillates extremely wildly and has little or no interpolative or predictive value. (Denial of this phenomenon creates dysfunctional superstitions among practitioners of numerical methods, but that's for some other time and place.) These analyses are a bit like  water balloons, squeeze them here and they bulge there; here, the main fit is nice but there's still the wide variety of tails. In the end one can only extract information that is actually present.

Then again, these things are only models - at best only aids to understanding - and they are far less exact than, say, Maxwell's equations.

You may be able to get a nice smooth fit to some points, but this is pretty useless unless it has predictive value and is better than some mindless Nth order polynomial (where N is the number of free parameters in the loget fit).  Like Nero below, I am skeptical.  I would think that Stuart's stability analysis would be in order.  Look at how an N order loglet fit does up to some time T, and then see how good the prediction is of the actual data from T until now. If you can do a better  job with fewer free parameters than the usual methods, then more power to you!
I must be in a sour mood but of course you get a better fit if you have more parameters.  The logistic curve has no theoretical backing and adding arbitrary parameters to make a better fit just wrecks the only thing that the logistic curve has going for it: its simplicity.  
its simplicity
Some are criticizing its simplicity, I'm just trying to extend the original Hubbert approach.
It's nice to see the math that can clairify an issue like this. We still have to figure out how to power civilization after 2050.
Actually, post-2050 is easy: wind, solar, wave, biomass, plus other things not discovered, powering electric transportation.  

The US has 2+TW of wind, and the world gets it's current usage via solar every 20 minutes.

It's the transition over the next 25 years that is the problem:  if oil disappeared tomorrow there would be no way to move to electric transportation quickly enough.

The key to how painful the transition will be is how quickly we convert to electric transportation.  The conversion has started (Tesla, Prius, Calcars), it's just not fast enough to prevent serious risk of pain.

How big a civilization must be powered after 2050?
To know the difficulty requires both better information of 2050 technology, which will have 40 years of high energy prices under its belt, plus information of population, which probably cannot be extrapolated based on the past 40 years' growth.
I suspect the next ten years data will make a mess out of your nice pretty bell curves. Of course I agree with Hubbert, just not to the point that I feel the data points must fit a nice symmetric pattern on the way down.
  Great post, Khebab. Once again your mathematics are so far beyond me that I feel like one of Johnathan Swift's Yahoos. Thanks for doing the work and putting it up for criticism and review!
Everytime I see an analysis like this loglet approach, with a potentially infinite number of variables, and no physical basis behind it, I feel better about the oil shock model which uses a few intuitive stochastic parameters backed up by solid physical reasoning.

But as always, Khebab is doing the hard digging and greatly appreciated

both your point, WHT, and grondeau's points (I'm still laughing at his 7th order comment--I must be a geek) above are fair.

Still, I come from the school that the more models you run, the better job you do critiquing the assumptions of other models.

In the end, all we have is incomplete information, and we are forced to guess.  I like having as many (prima facie valid) tests of those guesses at my disposal as humanly possible.

The best model I have is my gut -- it has rarely failed me.

your gut, Dave, may be valid, but it has reliability and replicability problems...
That's why you made me a Senior Contributor, PG, remember? Can't be replicated, of course, but...

I like to back up the intuition with the data ... and that analysis -- being fair, of course -- always seems to work out, ain't that something?  

Happy motoring!

Dave, is there any correlation between gut size and predictive reliablity?