Tuesday, April 24, 2012

Uncertainties in Ocean Heating Results

After some fair questions about measurement uncertainties for the Levitus et al GRL paper-in-press on ocean warming, I've asked some scientists for more information. The paper has Supplementary Material that will be included when it appears in final published form. I have a copy, and needless to say the authors take great care to explain how the uncertainties are calculated.

This WUWT post from Willis Eschenbach misses a crucial point. He converts heat changes back to temperature changes (dT = dQ/mc), and writes:
Here’s the problem I have with this graph. It claims that we know the temperature of the top two kilometres (1.2 miles) of the ocean in 1955-60 with an error of plus or minus one and a half hundredths of a degree C.... So I don’t know where they got their error numbers … but I’m going on record to say that they have greatly underestimated the errors in their calculations.
The basic point is that the statistical uncertainty of an average can be much less than that of any temperature measurement.

If you measure the temperatures T1 and Tof two different objects, each to an uncertainty of ΔT, what is the uncertainty ΔA in their average A? The typical way this is done is explained in most freshman physics labs, such as this. So the average temperature of the two objects will be (T1 + T2)/2, and the uncertainty in the average will be

ΔA = ΔT/sqrt(2)

which is less than ΔT. For N measurements the denominator becomes sqrt(N), so the uncertainty is much less. This might seem counterintuitive at first, but it's akin to the statistics of coin-flipping -- over the long haul, the expected average is 50% heads, with a variance (standard deviation) that goes like 1/sqrt(number of flips).

This is essentially what Levitus et al do, and it's completely legitimate. Experimental scientists take uncertainties religiously, and it is often a major part of the analysis, even more so than simply getting a result.

I don't know yet what the uncertainty of an ARGO buoy sensor is, but (to this point) I suspect it's significantly less than 1°C. That comment asks:
There is an amusing example in which they say if you measure the length of an object to the nearest millimetre often enough, you ought to be able to resolve individual atoms. Since atoms are 10^-8 mm you need about 10^16 measurements. What do you think? Is it possible in principle?
But this confuses two completely different concepts -- an individual measurement, and averaging. A measurement of a length is a completely classical measurement, with no notion of "atoms" or discreteness. So there's nothing wrong with a huge number of measurements resulting in an uncertainty less than an atomic length in a statistical sense. (Casinos, of course, rely on this to make their money.) But that says nothing about any particular measurement, only about the average of the measurements. So you're not "resolving atoms," which would necessarily happen in a particular measurement.

Just because the average height of U.S. men is 5 ft 9 in doesn't mean all U.S. men have a height of all U.S. men is 5 ft 9 in -- only the "average man," which is an abstract thing, not a thing that exists in the same sense that any of the men exist. (Viz. the average is a mathematical object. Men aren't.)


Piltdown said...

Thanks. I appreciate that people are willing to think about this and try to answer my query. Unfortunately, many won't - a polarised debate stays polarised.

Eschenbach's post is interesting, although light on detail. A reduction in error from 1 to 0.015 suggests a sample size around 4,500 if the measurements are perfectly independent. That's not an impossible number, but I'm a little surprised that the network down to 2 km deep was so extensive in the 1950s. I was curious as to what the background to that network of sensors was.

But the bigger issue is the independence assumption. As you say, it is typically taught in freshman physics that the uncertainty in an average goes down in proportion to sqrt(N). Unfortunately, that's not quite true in general. The bit that usually gets mentioned but then glossed over is that this only applies if the measurement errors are statistically independent. Partly because approximate independence is common, and partly because it makes the math much easier, it is usual to take it for granted and only consider this case.

But real-life science is messier than the textbooks.

The temperature measurements are samples of a distribution of values that varies systematically from place to place and from time to time. Samples that are not perfectly uniform over all locations and times will have some component of error in common. And their spread will be far wider than the error in a single measurement. You are hoping not only that the measurement errors in all the measurements are representative of the typical error, but that the temperatures at the times and locations you pick are representative of the global average. Neither assumption seems likely to be perfectly satisfied. Either can introduce residual correlations between measurements that slow and eventually stop convergence of the average.

The example of measuring the length of an object is slightly simpler in that you are always measuring the same object, and so measuring the same length. But quantisation errors are generally correlated. If you round 35.00136 mm to the nearest mm, you will always get the same answer. Averaging 35, 35, 35, ... for as long as you like will shrink the error no further. (And even if you get the ocasional 34 or 36, it's probably for other more psychological reasons.) The finer information is simply not there. So you cannot resolve atoms by eye, even with unlimited time and patience.

Piltdown said...

In case you are interested, the variance of an average is the average of the elements in the covariance matrix for the joint distribution of the measurements. If you take N measurements, you get an NxN matrix with large values V down the main diagonal, equal to the variance of the individual measurements, and usually small values u off the diagonal. If you assume the off-diagonal elements are all zero, then the average is VN/N^2, and you take the square root to convert variance to uncertainty. (Standard deviation, to be strict.) If v is not zero, you instead get ((V-u)N + u N^2)/N^2 which tends towards u rather than zero.

Levitus has presumably considered this and shown that the measurements taken in 1950 can safely be considered to have independent errors. Since I find that surprising, there is presumably something interesting to learn here. How did they do it? Where is my own intuition going wrong?

If we had 4,500 sample points divided evenly over 335 million km^2 gives one sample in every 75,000 km^2. They would be more than 270 km apart. And in the turbulent mixed layer at least temperatures can change significantly in a matter of hours or days. Does it include those parts of the ocean below the polar ice? Does it include measurements evenly spread over day and night? Or did they perhaps have a lot more than 4,500 measurements?

If this is right, it seems to me a triumph of scientific precision worthy of a lot more praise than it is getting. We can measure the temperature of hundreds of millions of cubic kilometres of unevenly heated water to an accuracy of nearly a hundredth of a degree? In 1950?!

Don't you think that's remarkable?