ClimateGate

12 December 2009

Background

Datasets of regular temperature readings using reasonably standardised equipment go back to around the mid-late 19th century. There are differences on equipment technology and calibration, but these can be taken account of. These datasets are not seriously challenged, and they show a statistically significant increase in global temperatures.

The problem is that climate change research needs to consider the increase over the last ~150 years in the context of cycles covering centuries (if not millennia). The magnitude of long time-scale (i.e. prehistoric) changes in relation to the measured 20th century increase is important as it indicates whether the measured global warming (notionally caused by CO2 emissions) is actually important in the (very) long term.

Getting prehistoric temperature readings

This is where reconstructed historical temperature records come in. In this case it is temperatures based on tree rings. Deriving temperature estimates from these sources has its own assumptions and complications, but these are a side issue.

And the problem?

It seems that tree ring temperature reconstructions only go up to around 1960, although it is unclear what the actual status of post-1960s data is. Because of the window-size (40 years) of the low-pass smoothing filter used on the data, the end of the data have to be padded. I'm not a statistician, but my guesstimate is that the smoothed data would only go to circa 1940 without the padding.

Since the end of this data series is a high, normal padding followed by low-pass smoothing would result in the end of the output graph tending towards a (lower) long-running moving average rather than follow any "recent" trend. To get round this, it seems the tree ring data was instead padded with instrumental data, the low-pass filter applied, and then the resulting graph truncated at 1960.

Data integrity

An interesting read is the Harry Readme, file from the CRU data, which documents someone trawling through the archive of CRU datasets and processing programs. Poor software engineering is not really a sin, even at research level, as efficiency and robustness to corrupt input is not the issue it is with released software. These people are not trained computer scientists. However the whole dataset looks like someone assembled it for their research, then simply archived the lot rather than making any effort to document what is where. Again this is not a sin in itself, as the resulting publication should contain sufficient detail on post-processing of raw data as to allow replication.

The problem here is that this dataset is also the primary source of raw data, and there seems to have been no proper documented segregation of the original station data, the unified raw data, and the post-processed data used for whatever research publication(s). There's also a load of programs suffering from bitrot in there as well. This means that the 4-year clearup also had to work out what was where, in addition to sorting out inconsistencies in the original data. Such large datasets will have errors in them, and there are statistical methods to get round them, but the last thing that is needed is more uncertainty as a result of assumptions in the cleanup.

CRU attitude

Much has been said of the attitude of the CRU to anti-climate-change lobby, but this anti lobby have themselves done things that are not exactly kosher. The most vocal on the subject have been those who lambast the case for climate change regardless, and among the more nationalistic parts of the political right wing anti-environmentalism is currently a major policy point.

The case for climate change

Although the case for climate change happening is not undermined, a lot of questions are raised. The biggest is whether the magnitude of climate change has been systematically overstated, particularly for political purposes (e.g. high fuel duty, nuclear power, etc).