Natural Spirit of Good Company: May 2010

(Maybe this weblog is still dead. The current unnatural stirring of its <body> is basically an elaboration of a comment I left on http://bishophill.squarespace.com/blog/2010/5/16/david-mackay-at-oxford.html, about David Mackay's talk about Mackay's freely-downloadable book Sustainable Energy --- without the hot air . I am well aware that it is insufficiently edited, and if more than three people read it, maybe I will feel bad about that. Or maybe not. And I refuse to feel bad about Blogger's endearing HTML-plus-homebrew-randomness treatment of blank lines, of paragraph tags, and of the interaction between the two, which might anyway have changed in the last year since I never was able to find it documented anywhere, so if this looks weird, then yup, I forgot my workaround, but I do remember I was never able to make my workaround less than clunky, and part of the fun was that the Preview always had different pathology than the published version, so I stand innocent whatever presentation hell I have sent my text into. And as for the logical hell it may seem to've issued from, I am vast, as vast as a swarm of subordinate clauses chewing on prepositional phrases and bleeding distracting observations, and sometimes losing the point, and indeed sometimes vaster than that. I contain vast sentences, and generally it takes time and care for me to shorten them, and it's a really nice day outside, and now the merest hundreds of lines of draft prose into this article I find it harder than before to care so much that someone is wrong on the Internet, and anyway this must be submitted as a blog post before I can submit a blog comment linking to it, and blog comments even more than blog posts should be timely, so it follows logically that my sentences contain multitudes, so deal.)

Mackay's other downloadable book, Information Theory, Inference, and Learning Algorithms, is impressive. I downloaded that book five years ago in order to study several sections of it. Hence my particular interest in seeing what Mackay had to say now about AGW.

Having just skimmed the first 15 pages or so of Sustainable Energy, I think Mackay is guilty of flaky preaching to the choir about the underlying AGW problem. He spends multiple pages in a lovingly detailed victory dance over the counterargument that CO2 concentration hasn't risen. He spends much less time on the counterarguments about the rather more important question of CO2 sensitivity --- disposing of them by saying it's complicated and then uncritically endorsing IPCCish results as a scientific consensus and thus a reasonable estimate of CO2 sensitivity. I don't see any nonpartisan way to justify that. As far as I can tell, disputes about the existence of a rise of CO2 level are completely marginal compared to disputes about temperature sensitivity. (CO2 level disputes seem to be a distant fourth behind at least three other disputes, about 1. temperature sensitivity, 2. temperature measurements, and 3. credibility of current-generation climate models.) Mackay has plenty of expertise in the fundamentals of statistical reasoning, and it would be nice if he'd write even a third as much about back-of-the-envelope cross-checks of his confidence in the IPCC temperature sensitivity as he spent on such cross-checks of CO2 level rise.

In particular, it'd be interesting to know how Mackay can justify appealing to a scientific consensus that circles the wagons around the original Mann hockey stick articles, and around the IPCC process which made those articles the flagship of sufficiently strong evidence and sufficiently sound analysis to justify punting previous ideas about large preindustrial climate fluctuations. Mackay has done a book's worth of research on quantitative sanity checks related to the AGW controversy, enough work to have published pages of cross-checks addressing a quaternary controversy. And before that, he published quite a good book on (more or less) the fundamentals of statistical inference. Thus, a lack of interest in quantitative sanity checks on the central CO2 controversy consensus seems out of place.

Perhaps Mackay subscribes to the view that the preindustrial temperature evidence is unimportant because the modellers have statistically sound demonstrations of sufficient ability to make quantitative predictions from first principles and recent measurements? Or that the statistical problems of the original hockey stick aren't important because later studies to defend its conclusions were done with fundamentally sound statistics, honestly accounting for what seems to have been a strong political temptation to, e.g., give rather heavier statistical weight to trees in a small dataset giving palatable results than to trees in larger datasets with less palatable result? I don't know how Mackay can justify deferring to the IPCC estimate of CO2 sensitivity without subscribing to one of those two possible views. However, I also don't know how he can easily subscribe to either. Conversely, it's sort of fascinating, but in a rather sad creepy way, when he finesses this by dropping from previous pages of physicist-speaking-to-physicist analysis --- performing sanity checks on the fundamentals by numerate back-of-the-envelope/elevator-pitch analysis --- to breezy chatter about "complex, twitchy beasts" and "Bad Things."

What kinds of cross-checks am I dreaming of here? As cross-checks of arguments for smallness of preindustrial climate changes, I nominate five examples. 1. How statistically reasonable is it to handle what is in effect a multisensor fusion problem by giving zero weight to our scattered incomplete hard temperature data (historical lake/river freezing times in Europe and various of its colonies, e.g.), calculating our result purely in terms of more indirect proxies (because of their compensating advantages like longer time series). 2. Roughly how sensitive might the post-Mann IPCC-camp temperature results be to outright cherry-picking and/or softer irregularities like giving heavier weight to a tree in an 8-tree dataset than to a tree in a larger dataset? 3. Given the level of local variation we observe in climate in naively-comparable sites today (e.g., comparable in altitude and latitude) roughly how often do we expect to see purely-local fluctuations of the level of the LIA/MWP observations? 4. How numerically reasonable was the Wegman network-analysis critique, and to what extent does it apply to the various generations of IPCC-favored analyses? 5. How information-theoretically reasonable is it to be pointedly uninterested (e.g., a long-standing pattern not publishing raw data and details about its collection, and of not energetically remeasuring and rechecking the proxies as the passage of time adds new tree rings or mudlayerwiggles or whatever; and the recently-controversial masterstroke of not graphing them either, in the famous "hide the decline" trick) in key details of at least the most heavily weighted proxies? Is the observed level of interest consistent with the hypothesis of a technical community honestly reaching a scientific consensus about a statistical reconstruction?

(I don't claim that each of those cross-checks would torpedo the IPCC position below the waterline. I do think that #1, #2, and #5 are serious criticisms. I also think that #2 and #5 are sufficiently common criticisms that addressing either or both instead of "CO2 concentration is not rising" would be much more unlike beating a strawman. I have mixed feelings about #4; quick-and-crude quantification of fundamentally messy things like social relationships doesn't appeal to me, but on the other hand, claims of "consensus" and "independent studies" are fundamentally equally crude quantifications also. Thus, to the extent that it's worth addressing such a crude simplification of a messy system, Wegman's calculation seems like a natural enough way for a statistician to try to do it. And I'm quite curious about #3, and I don't know why I have never run across a reference to such a calculation. I'm unlikely to do such a calculation myself, and less likely to write it up, since I think it would take me at least two weeks to get sufficiently up to speed on the data sources to get a result I'd be unembarrassed to put on a webpage. But many dozens of people are already very familiar with the data, and many of them might be able to do it in a weekend, and many of them write at least dozens of pages a year on similar subjects.)

It's harder to dream up direct quantitative cross-checks for the validity of IPCC modeller consensus. None of the AGW forecast data I've ever heard seem to be friendly to such back-of-the-envelope checks on any reasonable timescale. Thus my nominations for sanity checks here are not calculations, but questions "why [is it hard to find such cross-checks]?" and "what [the hell are we thinking then]?" 1. If modellers have the situation under control, why are they unable to (or unmotivated to?) pick easily-measurable numbers where their models make clear interesting predictions near-term predictions? 2. If they're not doing this, what is the strongest line of argument that they are reasoning clearly about having the situation under control, as opposed to, say, peddling overfitted nonsense?

To elaborate on point #1 here, it's a common situation in modelling that there are economically-important questions that we care about (e.g., how often a new kind of telephone exchange will have to refuse/drop calls because of congestion) which are expensive to measure directly. (First build the expensive piece of equipment, then wire it up to a bunch of customers to use them as guinea pigs...) Of course ultimately you *do* test this kind of thing on the poor guinea pigs, whether you like it not, but it's usually worth doing a lot of work before you get to that stage. If you have a model which purports to demonstrate a surprising result about behavior in full-scale customers-running-live mode, it's good to have evidence in support of that model before you actually test the thing on customers. (Note that "surprising" doesn't need to be terribly surprising, either: if you want to scale up a supermarket by 80% relative to the largest supermarket your chain has built so far, and claim that various size-dependent properties will be accurately (+/-15%, say) predicted by linear extrapolation from the sizes of existing stores, that might not be surprising in informal terms, but in this context it's surprising enough that you'd like some justification before you build it and let the guinea pigs in.) So if you have a model which is good enough to make "surprising" predictions in the expensive large, in my experience, it tends also to be good enough to make comparable predictions in the small.

In my experience in chemistry, a question which might matter economically is the value of a binding constant under some exotic conditions that will be very difficult (time-consuming, expensive...) to set up. If it's too difficult to set up unless the model is correct, then how then can your model prove its worth now, so that you know it's correct to do the expensive thing? The model doesn't prove its worth by stubbornly claiming that it really can calculate the one number that we ultimately care about, you damnable denier, but by successfully predicting surprisingly accurate results for dozens or hundreds of other numbers in related problems where measurement is much more practical. E.g., it might make a boatload of predictions about spectroscopic changes of the bound molecule in related conditions. In CO2 climate sensitivity questions I don't know enough to guess what should be simultaneously easy to predict, easy to measure, and surprisingly significantly different from naive extrapolation, but roughly the kind of thing I'd expect is Mackay writing "a famous early example was the Tarsasku model B 1997 "beach bunny" curve predicting the change in the power spectrum of coastal/inland nocturnal barometric fluctuations, which was clearly vindicated by 2002; Zer's microfoundations review article of 2003 gives 14 such predictions which met his 98% confidence level, and today we have approximately 300."

(Maybe that power spectrum example sounds unrealistically complicated? or unreasonably simpleminded? I don't mind simple predictions at all --- e.g., differential tropospheric warming? excellent in its simplicity. But my impression from other modelling is that in pursuit of results which are easy to predict, surprising, and easy to measure sufficiently accurately --- and from what I've seen of the troposphere controversy, tropospheric warming measurement accuracy is at best marginal --- one tends naturally to end up with esoteric predictions. Commonly, in fact, one ends up with very esoteric predictions, and I'd cheerfully accept predictions of the north/south hemisphere deviation of the El-Nino-corrected coastal/inland nocturnal barometric fluctuations, or results hairier than that, as long as they're precisely defined in terms of measurements which are routinely made to sufficient accuracy. Compared to the hairiness of tunneling down through all the layers of equipment and calculation to the actual physical reality of spectroscopic experiments (like 2D NMR, or various nonlinear superfast laser stuff), that seems almost tame. But I'm not impressed with hindcasting small noisy datasets with enormous computer programs, and I'm not impressed with any prediction which even today, after all the years which have passed since the science was settled, are hard to sharply distinguish from the supernaive naive rival hypothesis of "zero trend, not even no-feedback lukewarming, just the usual reddish noise drift" over the past decade, and which won't be clearly distinguishable from historical trend extrapolation for many years yet.)

And to elaborate on point #2, I'm venting, but fundamentally it's a serious question. And its seriousness doesn't depend on worries about left/right/enviro political subtexts, about professional clannishness, or about professional or financial incentives to reach particular kinds policy conclusions. Without any of those incentives, modellers can very easily be spontaneously guilty of overfitted nonsense; it seems to be very human to fall in love with the modelling approach one has chosen, and to believe its results with more confidence than one should. Relatedly, there is a strong human tendency to resent being suspected of merely falling in love with an unrealistic model, and to react by pumping out results which demonstrate that the model really can predict surprising stuff accurately, even if the experimentalists (did we mention that when attending institutions where people who qualify are able to become theorists, they "chose" to go into experiment? need we say more?) haven't yet been able to measure relevant quantities to sufficient accuracy to confirm our accurate surprising prediction for the result that people are economically motivated to care about. The modellers are human, if they aren't pumping out those results, I judge that it's alarmingly likely to be because their models simply aren't valid enough to produce forecasts sharply distinguishable from weak models like linear extrapolation.

(It looks to me as though preindustrial temperature fluctuations were large compared to deviations from the post-1800 warming trend. I know no strong reason to believe that non-CO2-driven fluctuations have calmed down since 1800. Thus, I have independent reason to suspect that modellers are peddling overfitted nonsense. But even if I didn't have that reason, perhaps in some alternative universe where written history only began in 1815, I'd still consider the lack of focus sharp specialized test predictions somewhere between "inexplicable" and "damning." )

(BRAAAAAIIINS!)

Natural Spirit of Good Company

Tuesday, May 18, 2010

On Hacker News on New Scientist on Living in Denial

Sunday, May 16, 2010

On David J C Mackay on AGW

About Me

Links

Previous Posts

Archives