## Friday, January 14, 2011

### Temperature Anomalies and Graphing Data

Globally Averaged Temperature Anomaly

One statistic that is used to understand climate is the annual globally averaged temperature anomaly.  It is not the only measure of global warming; there are a great many others, but it is one that the media tend to focus on because it is a convenient way of explaining what is happening to surface temperatures as a function of time.

The data here are taken from Global Land-Ocean Temperature Index (C) (Anomaly with Base: 1951-1980), which includes data from 1882-2007.  There are updated numbers available for more recent years, but I am using these data to respond to an argument made by a friend.

The data come from the National Aeronautics and Space Administration's (NASA's) Goddard Institute of Space Studies (GISS).  The base period for calculating the anomaly used by GISS is 1951-1980.  This choice is arbitrary.

The Hadley Centre, for example, prefer to use the base period 1961-1990. The data are presented relative to the average of the base period.  That means that a temperature anomaly of 0.2 is 0.2 K hotter than the average temperature for the base period.

Polemics or Science?

A friend of mine (who shall remain anonymous unless he chooses to out himself) recently posted a plot on Facebook.  He plotted the temperature in the Kelvin scale vs. year from the same source I used above.

The kelvin (K) is a unit of a thermodynamic temperature scale, the Kelvin scale. The third law of thermodynamics states the it is impossible to reach absolute zero in a finite number of steps, or alternatively that the entropy of a system at absolute zero is zero.

This law provides a natural zero-point for temperature. The Kelvin scale is simply the Celsius scale adjusted so that zero is absolute zero (-273.15 ºC).  Because it is an absolute scale, the unit is referred to as the kelvin, rather than degrees Kelvin.

If one were to assume that the mean temperature of the earth from 1951 to 1980 were 287 K,  one would obtain the results my friend plotted.

I found three discrepancies in  numbers that he provided me.  It is not really pertinent to the argument, but I include my changes for completeness. For 1944, he had 287.07 K instead of 287.19 K; for 1973, he had  286.86 K instead of 287.14 K, and for 1977, he had  286.88 K instead of 287.12 K.  All other values agreed with my calculation.  I used my calculated values throughout.

The argument seems to be that on a Kelvin scale, the warming trend does not look very dramatic.  My friend implies that the usage of temperature anomaly is based upon polemics rather than science, and if only the data were plotted in kelvin, it would be clear  that a trend of 0.16 K/decade is not something to worry about.

Before addressing this argument directly, I would like to use an analogy.

An Analogy: The Dow Jones Industrial Average

Suppose that I want to track how the the Dow Jones Industrial Average (DJIA) changes over a period of ten calendar days from Jan. 3, 2011 to Jan. 13, 2011.  Note that there are only data points for days when the stock market is open, but that should not hamper the analogy.

One way that we could track this data would be to plot the gains or losses relative to that first day as a function of date.

Note that I have scaled this graph using the data limits so that the reader can gain information from this graph. Of course this graph does not inform us of the actual value of the DJIA on a given day; so it might be nice to plot the value as a function of date.

Note again that I chose to plot the data using the data limits.  Despite the fact that the DJIA is in the 11,000 range, I have preserved the information in the graph by using the data limits.  If for some reason I wanted to obscure that information, I could plot it differently.

Note that these are the same data that I plotted above, but that now I have obscured the information.  The only information that this graph gives us is that the DJIA remained between 11,000 and 12,000 during the time period in question.

There is no reason to present this information in a graph.  I am obscuring any trend that might be in the data.

Note that it makes no difference whatsoever if I choose to use the price or the gains/losses.  It is the data range that obscures the information, not the choice of units.  I could do the same thing with gains and losses.

This plot obscures the information as well.  About the only information to be gained is that the DJIA did not fluctuate more than +/- 100 points during the week in question.  If I had used the data limits, I could surmise that the DJIA fluctuated from losses of about 32 points to gains of about 84 points relative to the first day of the period in question.

Nowhere have I changed the data.  It is all the same data plotted differently.  If I am going to bother to plot data, I should use a scale that preserves the maximum amount of information.  It does not matter what units I use.  In fact, I can plot the data all on the same graph.

Here I have plotted gains and losses against the left axis and the price against the right axis.  Note that the data are superimposed upon each other because both are plotted on a scale related to the data limits.

Examples

Note  major news sources plot index prices  in a manner related to the data limits.
There is no reason to plot the information if one does not preserve the information by using appropriate data limits.

Thermodynamic Temperature vs Temperature Anomaly

Of course the difference between my friend's plot and the plot of GISS temperature anomalies has nothing to do with the fact that my friend chose to report data as an absolute temperature.  Here is the same data plotted as absolute temperature in Kelvin, but with appropriate data limits.

I can also plot the temperature anomaly in a way that obscures the data.

It is absurd to plot the information this way as the plot does not display the information content of the data, but it is possible to do.  Note whether one chooses an appropriate data range or not has nothing to do with the units chosen to display the data.

I can display the data in Celsius.

I can display the data in Fahrenheit.

Heck, I can display the data in Rankine.

It does not matter what units I choose; it is the same data.  With any units I can choose to plot it according to the data limits, or I can obscure the information content by plotting it on a larger scale.  This fact is true for any data set whatsoever!

I can plot the absolute temperature and the temperature anomaly on the same scale.

The plots lie on top of each other if plotted using the data limits.  It's the same data; so it should not be a surprise.

I can plot, temperature in Kelvin, Celsius, Fahrenheit, and Rankine along with the temperature anomaly in Kelvin all on the same plot.

So Why Use Temperature Anomaly?

There is no polemical reason to use temperature anomaly.  If I were only interested in polemics, the effect is just as obvious using the absolute temperature scale with proper data limits.

If that is the case, why bother to use anomaly at all?  Perhaps, one should consider the arguments of the scientists who make such a choice. GISS explain as follows:
Our analysis concerns only temperature anomalies, not absolute temperature. Temperature anomalies are computed relative to the base period 1951-1980. The reason to work with anomalies, rather than absolute temperature is that absolute temperature varies markedly in short distances, while monthly or annual temperature anomalies are representative of a much larger region. Indeed, we have shown (Hansen and Lebedeff, 1987) that temperature anomalies are strongly correlated out to distances of the order of 1000 km. For a more detailed discussion, see The Elusive Absolute Surface Air Temperature.
Consider the complexity of the problem of assigning a surface air temperature to the earth.  There is a well-known proverb that goes:
A man who has a thermometer knows what the temperature is.  A man who has two thermometers isn't sure.
Temperature varies locally and globally, not only by latitude and longitude, but also by altitude.  There is no agreement about how to measure the surface air temperature (SAT) for a given location, but there is general agreement about how to measure an anomaly. From The Elusive Absolute Surface Air Temperature:
Q. What exactly do we mean by SAT ?
A. I doubt that there is a general agreement how to answer this question. Even at the same location, the temperature near the ground may be very different from the temperature 5 ft above the ground and different again from 10 ft or 50 ft above the ground. Particularly in the presence of vegetation (say in a rain forest), the temperature above the vegetation may be very different from the temperature below the top of the vegetation. A reasonable suggestion might be to use the average temperature of the first 50 ft of air either above ground or above the top of the vegetation. To measure SAT we have to agree on what it is and, as far as I know, no such standard has been suggested or generally adopted. Even if the 50 ft standard were adopted, I cannot imagine that a weather station would build a 50 ft stack of thermometers to be able to find the true SAT at its location.
Over short distances, weather stations may measure very different temperatures for a given day, but temperature anomalies often agree for much larger distances.

The Hadley Centre does provide one way to calculate surface air temperatures from their data, but the calculation comes with a lot of caveats.
Q. How are the daily mean temperatures calculated? And in case of there being several ways, how can you be sure that those ways are equivalent?
A. Because we use temperature anomalies from a station climatology, it doesn't matter how the average temperature is calculated as long as it is always done in the same way (the differences will cancel in the climatology and the monthly values). For UK data we still use the average of the Max and Min temperatures. This gives us homogenous long-term series from a station. Some other countries calculate the average in other ways. So long as they don't change the method of calculation, the results will be consistent. If the calculation method is changed we apply corrections to the reported values. In some instances it is possible that the method was changed, but no record was made. The uncertainties associated with such inhomogeneities are discussed in Brohan et al. 2006.
Q. Why do you use anomalies?
A. Anomalies vary slowly from one place to another - if it is warmer than average in London, it is likely to be warmer than average in Paris too - but actual temperatures can vary greatly from one weather station to its nearest neighbour. The average anomaly for, say, Europe is likely to be representative of a large area, the average absolute temperature will be representative of only a very limited one.
Q. How do you obtain a global annual average temperature from the monthly data?
A. First the monthly anomalies in each grid box are averaged together to give an annual average anomaly for that grid box. The area-weighted averages of these annual average grid-box anomalies are then calculated for the northern hemisphere and for the southern hemisphere. The global average temperature is the arithmetic mean of the northern hemisphere average and the southern hemisphere average. The last step avoids biasing the global average to the more densely observed northern hemisphere. There are, of course, other ways to calculate the global average and each will give a slightly different answer.
Q. The HadCRUT3 data are expressed as anomalies, but I want actual temperatures.
A. HadCRUT3 is an anomalies dataset, and all the uncertainties apply to the anomalies. If you are interested in year-to-year changes it's best to use the anomalies if you can. So before you start using the actuals, think hard to check you can't use the anomalies instead. We can make actuals - we merge the SST climatology from the HadSST2 dataset and the land climatology from CRU high resolution dataset (New et al 2002 - see http://www.cru.uea.ac.uk/cru/data/hrg.htm), to make a climatology for HadCRUT3 and add this to the anomalies dataset. But this has two problems: it adds an additional source of uncertainties which we don't allow for, so our uncertainty analysis is no longer valid; also land surface actuals vary over short distances because of large changes in altitude: so the actual range of temperatures in a 5 degree grid box can be large, and the mean value is not always useful. The absolute global-average annual temperature and the absolute hemisphere-average annual temperatures for 1961-1990 were calculate by Jones et al. (1999). They are: Globe 61-90 average = 14.0°C Northern Hemisphere 61-90 average = 14.6°C Southern Hemisphere 61-90 average = 13.4°C.
It is possible to use such conventions and then to present a mean temperature as a function of time. If the data were presented that way, those who wished to attack the data would find other reasons for doing so. They'd make the very same arguments that climate scientists make for why surface air temperatures are elusive.

To argue that data are presented as anomalies for polemical purposes is absurd. There is no polemical benefit to doing so as I have shown.

Ultimately, as long as one is consistent with conventions and careful on applying uncertainties, one could establish and use an SAT convention and report temperature changes over time. There may in fact be polemical advantages to such an approach, but there are some drawbacks in making the data generally usable.

Climate scientists have established a convention of using temperature anomalies for scientific purposes. I cannot see any scientific objections to this convention vs. any other convention. If someone has a better convention, that person should propose it as a standard; maybe it will be accepted, but that standard will not change the data, nor will it change the way that the data look on a plot.

Sources

Erik Jacobs said...

Thanks, Rich. I don't mind you using my name, as long as it is clear that I am not making any claims about the underlying science, but merely about the presentation of that science. You made that clear, so, though I might quibble about a few details, I have no substantive problem with the way you characterized my argument. That said, I think that it will be very difficult to double-post and/or cross-post back and forth from FACEBOOK, so I will respond to your blog entry on FACEBOOK rather than here.

Rich said...

Thanks, Erik.

Blogger said...

Did you know that you can create short urls with AdFly and get \$\$\$ for every click on your short urls.

Teju Teju said...

Nice Information Keep Updating Data Science online Training Bangalore