Frequently Asked Questions

1. Are these data really free?
2. Can I share this data-set with colleagues in my institution?
3. Why has the file format changed, compared to previous versions?
4. Can the observed data-sets be used for time-series analysis?
5. Can you give me climate variable (parameter) X, which you do not list?

6. How do you make the scenarios from all the different data files in the scenario data-sets?
7. How are the temperature variables related to each other?
8. (a) Why aren't there any variations (in March) from year to year at grid-box X? (modified 23.06.03)
8. (b) Why is there a substantial change in year-to-year variability at grid-box X?
9. (a) Can you supply me with the same data-set using a smaller or larger number of files?
9. (b) Can you supply me with just a sub-section of the same data-set?
9. (c) Can you supply me with the same data-set on a different grid?
10. Can you supply me with the same data-set in a format suitable for my software?

11. What are the basic raw data used for the observed time-series?
12. How do you calculate the observed grids?
13. Why do you use a two-stage procedure to calculate the observed grids?
14. What are the basic raw data used for the scenarios of the future?
15. Should I use these scenario data-sets, or raw GCM outputs?

16. Why won't my browser let me download these files?
17. Can you send me a copy of paper X, which you mention?
18. Can you supply me with software (code) for reading the data?
19. Can I reconstruct the original observed data from your data-set?
20. Can you supply me with the 1961-1990 baseline for the data-set you have already sent me?

21. What are the details of your grid? Can I have just a grid, without the data?
22. Which stations did you use to construct the observed grids, and can I have them?
23. Where are the land-sea boundaries on your grid?
24. What is the elevation of each grid-box?
25. Can I re-interpolate your data onto a higher resolution grid?

26. There appears to be an increase in variability between CRU TS 1.0 and CRU TS 2.0. Why?


1. Are these data really free?
Yes - to all academic institutions. If you find the data-set useful, then you may be able to help us to improve and update the data-sets. There are two principal ways in which you might do this:

Contact me if you would like to help.

2. Can I share this data-set with colleagues in my institution?
Yes - I recognise that these are large files and that there is no point in people from the same institution downloading them repeatedly. However, I do still ask anyone who uses the data-set to send me their details, as if they were applying for access too. The details requested and the reasons are given on the request page. This is a condition of use.

3. Why has the file format changed, compared to previous versions?
The latest files were produced for a particular 'customer' (an EU consortium) who were keen to minimise data volumes. The time-based grids that I provided to them are more efficient than the space-based grids, because they eliminate the ocean grid points. The file format is described here.

To access the scenario data-sets (TYN SC 1.0 and 2.0) you must unpack the raw files. You may achieve the unpacking either by the supplied fortran software (requires Unix and possibly some programming), or else by doing the unpacking yourself. See the unpacking information. Unless you unpack as specified the data you deduce will be nonsense.

4. Can the observed data-sets be used for time-series analysis?
See the page on time-series analysis.

5. Can you give me climate variable (parameter) X, which you do not list?
No. I only supply the data described on the data-set pages. However, it is possible to derive estimates of some secondary variables from the supplied variables. You may use these empirical relationships if you wish. Please note that the values you calculate are not direct observations of climate, but are estimates based on empirical relationships with grids of observations. In particular, each of these secondary variables requires at least two different primary variables, each of which is interpolated separately. Therefore there is no guarantee that different variables at the same grid-box are physically consistent with each other, and it is possible that your calculations of the secondary variable may have values that are not physically possible. Because of the underlying basis of the data-sets, the empirical relationships will be much better for some locations and periods that for others. You are advised to check the quality of the calculated values before using them in any important work. You are responsible for your own use of these empirical relationships!

6. How do you make the scenarios from all the different data files in the scenario data-sets?
Either using the fortran software supplied, or using the instructions supplied. See the unpacking page for more information. You MUST unpack the data-set one way or the other; without that unpacking procedure the data you deduce will be nonsense. I do not provide any software support.

7. How are the temperature variables related to each other?
On a daily time-scale:
(1) DTR = Tmax - Tmin
(2) Tmean = Tmin + (DTR/2)
The monthly data are the monthly means of the daily values.

8. (a) Why aren't there any variations (in March) from year to year at grid-box X? (modified 23.06.03)
There are two possible reasons, one general, and one specific to precipitation.

Firstly, in the observed data-sets, for locations and periods when insufficient data were available to calculate a value, the mean from that grid-box in 1961-90 was imposed. This may result in no year-to-year variations. The same feature may be seen in the scenarios for the future that re-use the observed time-series. For a fuller explanation of this feature, see the page on time-series analysis. There are also station files available, which indicate the number of stations within range of each grid-box at each moment in time.

Secondly, this may occur for precipitation in areas that satisfy the following conditions:

These occurrences are the result of interpolating using absolute anomalies. See the 'precipitation special case' section in the page on time-series analysis for more information.

8. (b) Why is there a substantial change in year-to-year variability at grid-box X
The number of stations contributing to a grid-box varies over time. Generally speaking, the greater the number of stations, the smaller the interannual variability. If there is an abrupt change in the number of stations contributing information (for example, all the stations in a particular country may stop reporting because of war), there may be an abrupt change in variability as well. Further information is available from the page on time-series analysis. There are also station files available, which indicate the number of stations within range of each grid-box at each moment in time.

9. Can you supply me with ... ?
(a) Can you supply me with the same data-set using a smaller or larger number of files?
(b) Can you supply me with just a sub-section of the same data-set?
(c) Can you supply me with the same data-set on a different grid?

No. The data-set is supplied in its existing form only. I don't have time to reconstruct the data-set upon request. However:

10. Can you supply me with the same data-set in a format suitable for my software?
No. The data-set is only supplied in its existing format. I don't have time to reconstruct the data-set upon request. However:

11. What are the basic raw data used for the observed time-series?
The observed grids are based on extensive databases of monthly measurements of climate at individual stations. These databases are the work of a number of individuals in the Climatic Research Unit over many years, and rank among of the best in the world. These databases are not in the public domain, not least because of the restrictions on any redistribution of data that are imposed upon us by those who supply us with station data. Station coverage is denser over the more populated parts of the world, particularly the United States, southern Canada, Europe and Japan. Coverage is sparsest over the interior of the South American and African continents. These databases have been subjected to homogeneity checks, but not to the same extent as the databases underlying the coarser grids of Jones or Hulme. [based partly on the Phil Jones FAQ]

No other sources of data are used to construct the climate grids. There is no satellite information included. There is no remote sensing information included. The observed grids are based exclusively on meteorological measurements from individual stations.

12. How do you calculate the observed grids?
We use a two-stage procedure:

  1. We calculate the anomaly time-series for each station, relative to the 1961-90 mean (25% missing values are typically permitted in the calculation of the mean), and grid the anomalies at 0.5 degrees.
  2. We add the gridded anomalies to the well-established 1961-90 climatology grid.

13. Why do you use a two-stage procedure to calculate the observed grids?
Regional reporting variations arise because stations are at different elevations, and different countries calculate monthly statistics using different methods and formulae. To avoid biases that could result from these variations, monthly values are reduced to anomalies relative to the period with best coverage (1961-90). The anomalies tend to be much less susceptible to the regional reporting variations, and therefore are more spatially consistent. By using station anomalies to obtain the month-to-month and year-to-year variations, we obtain more robust results. These results are then combined with the well-observed mean climatology from 1961-90 to obtain high-quality estimates of the temporal variations in absolute values. [based partly on the Phil Jones FAQ]

14. What are the basic raw data used for the scenarios of the future?
The year-to-year and month-to-month variations are based on the observed record from the 20th century. The long-term changes are based on the outputs from state-of-the-art global climate models (GCMs).

15. Should I use these scenario data-sets, or raw GCM outputs?
This depends entirely on your purpose. The most important advantage from using raw GCM outputs is the guaranteed physical consistency between variables. However, for many purposes, it is better to use these scenario data-sets:

The net effect of these advantages is that it becomes much easier to conduct systematic investigations into the future of the environmental system being modelled.

16. Why won't my browser let me download these files?
I don't know. This problem usually arises with Microsoft Internet Explorer, so try a better browser. For example Mozilla or Opera.

If the problem only arises with a compressed file, such as an elevations file, it may be because you are trying to open the file in a web browser, rather than save it.

17. Can you send me a copy of paper X, which you mention?
No. This is because I don't have the money in my grants to pay for the many hundreds of offprints that would be required, and because I don't have the time to individually address and send them. However, I recognise that having access to the relevant publications, particularly in the peer-reviewed literature, is important for any scientific work. Therefore I make electronic versions of my publications available wherever possible. My publications page has details of all my publications. If a publication is not available through that page, I am not able to supply you with a copy of it.

18. Can you supply me with software (code) for reading the data?
No. The data-sets are supplied in the form of ASCII files, which are clearly formatted and documented. If you wish to use a particular software package (or code) to read and manipulate the ASCII files, you are welcome to do so, but I cannot provide you with any assistance.

19. Can I reconstruct the original observed data from your data-set?
No. This is not because I don't want you to do it, but because it is impossible. The grids I supply are the result of an interpolation procedure based on a network of stations that varies over time. Therefore it is impossible to deduce the original station observations from these data-sets.

20. Can you supply me with the 1961-1990 baseline for the data-set you have already sent me?
Yes! No additional files are needed! Simply average together the 30 values for 1961-1990 from the files you already have.

21. What are the details of your grid? Can I have just a grid, without the data?
The data that I supply are all on regular latitude-longitude grids. Therefore it is very simple to deduce yourself the details of the grid, including individual grid-box locations, from the information supplied on the website. All you need is:

22. Which stations did you use to construct the observed grids, and can I have them?
This information is not publicly available. The underlying station databases are not in the public domain, not least because we only receive much of the data under non-disclosure agreements. What we are able to release is information on how many stations are available to contribute to each grid-box. This
information is made available with the observed data-set itself.

23. Where are the land-sea boundaries on your grid?
You can deduce this information in either of two ways:

24. What is the elevation of each grid-box?
Inspect the relevant elevations file, which is downloadable via the file
format page.

25. Can I re-interpolate your data onto a higher resolution grid?
At one level the answer is 'yes', because we don't place any constraints on how the data is used. However, the regridding may turn out - in scientific terms - to be useless at best and nonsense at worst. If there were a simple way to represent monthly climate on (say) a 1 kilometre grid, we would have done it ourselves, because lots of people would like 1 kilometre grids! The reason why we haven't done it is because we can't justify it scientifically.

Sometimes a researcher wants a high spatial resolution because there is a small study site for which climate information is required. In this case, we recommend that the researcher obtains a detailed station record from as close a station as possible. This advice reflects the principle that the scale of the climate data should reflect the scale of the application. There is no substitute for using station-scale data to represent station-scale climate. For station data, try NCDC. I do not provide any station data.

If a grid is required for a large area at a higher spatial resolution than we supply, it is legitimate to reinterpolate our grids at a higher spatial resolution (say 1km), providing that the data is only smoothed. There should not be any elevation-sensitive controls on the smoothing; it should just be a simple interpolation. However, one should recognise that there is no more information in the reinterpolated 1km grid than in the original 0.5 degree grid. The same data is merely being presented at a higher spatial resolution.

If a grid is required for a small area at a higher spatial resolution than we supply, it is legitimate to use our grids to construct a product with added information. One might obtain as many stations as possible for the area concerned, obtain the 10 minute global climatology (CRU CL 2.0), and calculate - by combining the grid and station information - a mean climatology for 1961-90 for a grid covering that limited area. One must recognise that this will be a first order estimate, with a low precision (accurate to within a few degrees). One might then calculate the anomalies (versus 1961-90) on the 0.5 degree grid (1901-2000=CRU TS 2.0; 2001-2100=TYN SC 2.0), and smooth these (without any elevation-dependence, just interpolating) onto the 1km grid to obtain a time-series for 1901-2100. The reasoning here is that the main elevation-dependent effort should be put into making the climatology as accurate as possible. The month-to-month variations occur on much larger spatial scales, and so the information implicit in the 0.5 degree grid can be assumed to apply to the 1km grid, at least at the level of genuine precision available for the 1961-90 climatology (i.e. accurate to within a few degrees).

26. There appears to be an increase in variability between CRU TS 1.0 and CRU TS 2.0. Why?
Without examining the grid-boxes concerned, this is impossible to answer definitively. However, the following remarks may be of some help.

Any such increase in variability is often noted in a less-well-observed region of the world. Small additional numbers of stations becoming available in the underlying database (on which the interpolations are based) may result in a substantial increase in apparent variability.

For example, in CRU TS 1.0 the only time-series information might be provided by a station at a distance of 200 km. Because of the large distance involved, this station might only allow a weak inter-annual variability in precipitation in the region of interest. However, if an extra station at a distance of only 20km was added to the database before CRU TS 2.0 was calculated, the apparent inter-annual variability might become much stronger.

If this explanation seems relevant, the information on time-series analysis might also be of interest. That page addresses trends, but the information on how the interpolation is done is relevant here.