Advice for time-series analysis

CRU TS 2.0 and time-series analysis: advice for users

Contents

Introduction

Q1: Is it legitimate to use CRU TS 2.0 to 'detect anthropogenic climate change' (IPCC language)?

Q2: Is it legitimate to use CRU TS 2.0 to measure regional climate change?

Relevant features of CRU TS 2.0

Precipitation: a special case

Hints for time-series analysis

Further information

Introduction
The Climatic Research Unit is keen to provide the best possible climatic data to users. Therefore we have developed regular latitude/longitude grids of climate data that are suitable for a wide range of purposes. We do not impose any restrictions on how these data are used, except for the obvious requirement that the data should be correctly attributed to our publications.

However, it has become clear to us that some misunderstandings have arisen concerning how these data sets should and should not be used. Doubtless many users have themselves experienced these misunderstandings when supplying information from their own area of expertise to researchers from other disciplines! We are making this statement in an effort to clarify our position. We emphasise that we are not attempting to restrict users, and that users themselves must bear full responsibility for how they use these data-sets.

Please also note - a frequent request! - that if you choose to use these data, you must abide by the conditions of use that we do impose. In particular, whenever you present work using these data (whether orally or in a paper), you must:

specify which CRU data-set you are using, by giving the dataset name (e.g. CRU TS 2.0);

cite the reference specified.

The particular concern that we address here is the extent to which it is legimitate to use CRU TS 2.0 (the 0.5 degree observed grids) to examine climate change. There are two distinct issues:
1. Is it legitimate to use CRU TS 2.0 to 'detect anthropogenic climate change' (IPCC language)?
2. Is it legitimate to use CRU TS 2.0 to measure regional climate change?

Question One
Q1. Is it legitimate to use CRU TS 2.0 to 'detect anthropogenic climate change' (IPCC language)?
A1. No.

CRU TS 2.0 is specifically not designed for climate change detection or attribution in the classic IPCC sense. The classic IPCC detection issue deals with the distinctly anthropogenic climate changes we are already experiencing. Therefore it is necessary, for IPCC detection to work, to remove all influences of urban development or land use change on the station data.

In contrast, the primary purpose for which CRU TS 2.0 has been constructed is to permit environmental modellers to incorporate into their models as accurate a representation as possible of month-to-month climate variations, as experienced in the recent past. Therefore influences from urban development or land use change remain an integral part of the data-set. We emphasise that we use all available climate data.

If you want to examine the detection of anthropogenic climate change, we recommend that you use the Jones temperature data-set. This is on a coarser (5 degree) grid, but it is optimised for the reliable detection of anthropogenic trends. For precipitation trends, use the Hulme data-set (5 degree grid or 2.5 x 3.75 grid). There are few alternatives to Hulme in the first half of the 20th century; later, to include the oceans use the Xie and Arkin data-set; for the last 25 years you could also use the GPCC data-set.

Question Two
Q2. Is it legitimate to use CRU TS 2.0 to measure regional climate change?
A2. Sometimes. There is no single yes/no answer to this question.

The reason why we cannot say 'yes' is that CRU TS 2.0 does not give an homogenous representation of climate change at every individual grid-box. This is not a fault, but an inevitable consequence of the design. CRU TS 2.0 is our best estimate of the spatial pattern of climate at each moment in time; it is complete in space and time. In this sense, CRU TS 2.0 is 'space-optimised' rather than 'time-optimised'. Because the emphasis is placed on obtaining best estimates of the instantaneous spatial patterns, inhomogeneities may be present in the time-series an individual grid-box.

If any significant inhomogeneities are present in the grid-box of interest, any time-series analysis on that grid-box will give inaccurate results. In some cases, the inhomogeneities may be sufficient to deter the user from using the grids for time-series analysis. However, in many cases the time-series analysis may still be possible. The individual user must decide, but we give as much supplementary information as possible to permit a wise decision. This information is given via this web-page. It consists of the following:

Relevant features of CRU TS 2.0. An explanation of why CRU TS 2.0 does not give an homogenous representation of climate change at every individual grid-box.

Hints for time-series analysis. Guidance for the user, to help decide whether to use CRU TS 2.0 for their time-series analysis. This guidance rests upon the explanation above!

Relevant features of CRU TS 2.0
There are two relevant features that are critical:

The grids are based on raw station data. If in July 1907 there is a grid-point in central Africa that is greater than 1200km from the nearest station with temperature measurements for July 1907, that grid-point for July 1907 will be given an imposed value. The imposed value will be the average of all July temperatures at that grid-point from 1961-90. We call this feature a 'relaxation to the climatology', and it is described more fully in connection with the station files. This feature applies mostly early in the 20th century, outside the 'developed world', and for less-well-reported variables; for these cases less raw station data are available.

This feature was included to ensure that the data-set is complete in both space and time. This feature is based on the assumption that if there is no time-specific information available, the best estimate for that moment in time is a long-term average. The term 'best estimate' is important: CRU TS 2.0 is our best estimate of the spatial pattern of climate at each moment in time.

Although this is a valuable feature, it may be problematic when examining changes at a grid-box, or for a region. The effects of this feature can be found by inspecting the time-series at the level of the grid-box to see whether there are periods when each January (or each February, or ...) has the same value. Obviously, if a grid-box includes this feature, then calculating the least-squares regression line for 1901-2000 is meaningless and misleading!

Each monthly grid is an interpolation based on the set of stations available at that moment in time. From one month to the next, the network of available stations will change. This is because the availability of data from a particular station tends to fluctuate over time.

Again, this interpolation method was adopted to give a best estimate of the spatial pattern of climate at each moment in time. However, it does mean that the changes over time at an individual grid-box will not be due solely to genuine changes in climate, but also to fluctuations in the network of stations. The effect of such fluctuations is minimised by interpolating station anomalies rather than station absolute values, but cannot be entirely removed.

Where the station network is dense, the effect of any individual station entering or leaving the network will be minimal. However, where the station network is sparse, the presence or absence of a single station may have a significant effect on the time-series at a nearby grid-box.

The effect on a time-series analysis might be thought of in terms of the proportion of the variability and trend at a grid-box that is contributed by this feature. This effect can be reduced by increasing the scale of aggregation (i.e. the number of grid-boxes that are being averaged into a region). The reduction is achieved because the station network is being made more dense relative to the number of regions. However, even increasing the scale of aggregation cannot entirely eliminate this effect.

Precipitation: a special case
It was noted above (Features) that a sequence of Januarys might have the same value (the 1961-90 mean), as a result of a lack of station data. For precipitation, a similar feature might be found for a different reason in areas that satisfy the following conditions:

the area commonly receives only a small amount of precipitation in the given calendar month;

the area commonly receives any precipitation in the form of localised showers;

the area has only a sparse network of rain gauges.

In this case, the sequence of Januarys for the same reason might arise from interpolating using absolute anomalies, rather than proportional anomalies. An example of this is the sequence of Februarys from CRU TS 2.0, at grid-box 331, 211 (in West Africa). In this grid-box, the 1961-90 mean precipitation in February is 22 mm, but there are frequent values of 2 mm recorded in the time-series.

So why are there repeated values of 2 mm? These repeated values represent years in which no February precipitation was observed in the sparse network of rain gauges in this area. Because of the method of using absolute anomalies, this is recorded as various negative anomalies at the individual stations, which are then interpolated onto the half-degree grid. When interpolated, these negative anomalies may equate to -20 mm for February for this grid-box (331,211). Since the February climatology for 1961-90 for this region (which was calculated differently, in CRU CL 1.0) was 22 mm, the zero station precipitation equates to a grid-box precipitation of 2 mm. To follow the given example, in 1961-1970 only 1965 and 1968 have grid-box values for February that differ from 2 mm. An inspection of the underlying databases reveals that in the three closest stations, it was only in 1965 (2 stations) and 1968 (all 3 stations) that any precipitation was recorded.

This may seem odd. If there is no recorded precipitation at local stations, how can there be 2mm of precip in the grid-box? The answer is that station precipitation and grid-box precipitation are different quantities, and may vary quite widely in areas with sparse and localised precipitation. A large area may have only a few rainstorms in any individual month, which contribute to the average precipitation in the grid-box, but remain undetected by the few rain gauges in the region. Therefore this grid-box estimate of 2mm is not inconsistent [deliberate double negative] with the observed station information.

A different method would be to use proportional anomalies, rather than absolute anomalies. Under this approach the zero precipitation would be interpolated onto the grid as ratios (actual:climatology) of zero, and the grid-box precipitation would show as zero. For densely observed areas, or areas where precipitation generally occurs over large areas, this approach might be superior for some purposes. However, for this area of western Africa in February the result would be to make the amount of precipitation at the grid-box as a whole appear to fluctuate wildly from year to year, which is probably unrepresentative of the real world area.

So the bottom line is that the data is correct as it stands, in that it is the result of accurately carrying out the method adopted, and that it accurately reflects the real-world experience, as far as we can tell. Whether it is useful to the user in that form is a different matter, and depends on the user. The user should carefully consider whether to use this area-averaged product, or whether to use station information. As far as possible, the spatial scale chosen should match the spatial scale of the information with which the climate data is to be compared.

Hints for time-series analysis
If you are interested in sub-grid-scale areas or in small data-sparse regions, you would do better to examine the station data itself rather than the grids. This will give you three new challenges:

Obtain the station data. This is not easy. The station data behind the grids are not in the public domain, not least because we only receive much of the data under non-disclosure agreements. The best place to start your search is NCDC.

Validate the station data. The station time-series you obtain is not necessarily accurate or homogenous. There is a substantial literature on this subject. The best place to start your search is Peterson et al (1998).

Interpret the station data. Doubtless the station will be 1, 10, 100, or 1000km from the place of interest to you. How does the climate at the station relate to the climate at the place of interest? Here there is no substitute for a basic understanding of climatology.

If you have decided to use the grids, then the larger the scale of aggregation that you choose, the more reliable your results will be.

Visually inspect the time-series for any grid-boxes in which you are interested. If any periods show relaxation to the climatology, eliminate any such periods from your time-series analysis.

Download the station files, which indicate the number of stations within range of each grid-box at each moment in time. Compare the station file with the data file for the grid-box of interest.

Are there abrupt and substantial changes in the number of stations within range? If so, pay particular attention to the data time-series at these points in time.

Are there abrupt and substantial changes in the climate mean, or in the amount of year-to-year variability? (Look at this for January separately from February, March, ...) Changes in the variability, in particular, may be related to simultaneous changes in the number of stations within range.

Further information
For further information, you are advised to carefully study the series of publications on these high-resolution climate grids. These publications are detailed in the table of data-sets.

For a discussion of how not to use these data-sets, read the comment on Hay et al (Nature, 2002) by Patz et al (Nature, 2002).