It's commonplace to hear claims, from people who don't like the results, that temperature data has been manipulated. As Raymond Pierrehumbert wrote
, Paul Ryan claimed that in a 2009 op-ed
"The CRU e-mail scandal reveals a perversion of the scientific method, where data were manipulated to support a predetermined conclusion. The e-mail scandal has not only forced the resignation of a number of discredited scientists, but it also marks a major step back on the need to preserve the integrity of the scientific community. While interests on both sides of the issue will debate the relevance of the manipulated or otherwise omitted data, these revelations undermine confidence in the scientific data driving the climate change debates."
Ryan, who in short order has demonstrated a truth-telling problem with even the small stuff
, offers no evidence for such a claim. Fake skeptics like Steve Goddard claim it
routinely, again, with no proof or evidence ever offered. It's scurrilous and extremely low.
But there are ways you can test for fraudulent data. One of the simpliest is Benford's Law
, which specifies the expected distribution of the digits in any dataset. It's particularly applicable to large datasets that span several orders of magnitude. As Wikipedia explains:
Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than 5% of the time. This distribution of first digits is the same as the widths of gridlines on the logarithmic scale. Benford's law also gives the expected distribution for digits beyond the first, which approach a uniform distribution as the digit place goes to the right.
This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.
...There is a generalization of the law to numbers expressed in other bases (for example, base 16), and also a generalization to second digits and later digits.
, in a base "b" number system the leading digit "d" should occur with probablility
where to evaluate the base-b logarithm you can use logb(x) = ln(x)/ln(b).
Awhile back, after I heard about Benford's Law on Radiolab, I applied it to the monthly GISS global anomaly. I multiplied it by 100 to get an integer, and converted it to base 3 so the numbers spanned a few orders of magnitude. (In base 3, an order of magnitude is 3.) I then find the distribution of the digits 1 and 2:
incidence of leading digit being 1 = 62.0%
incidence of leading digit being 2 = 38.0%
The theoretical values are P3
To test a possible manipulation, I took cooked up a simple warming trend: a pure linear trend of +0.01 C per month, so the data read 0.01, 0.02, 0.03, .... That gave
incidence of leading digit being 1 = 69.1%
incidence of leading digit being 2 = 30.9%
which are much further from the expected distribution, making them suspicious.
I didn't get more sophisticated than this, because I didn't find an efficient way to convert to an arbitrary base using Excel, and I don't believe data is being manipulated anyway -- it's all too consistent between groups, there's never been the slightest hint of any manipulation, and the people making the accusations have never offered any proof or evidence and are so usually dishonest about everything else I didn't see the point in going further. This is more proof than they've ever given.
So, for whatever it's worth, here it is: a simple test that detects no fraud.