RELEASE NOTES FOR CRU TS v4.01: 15 September 2017

The CRU TS dataset was developed and has been subsequently updated,
improved and maintained with support from a number of funders,
principally the UK's Natural Environment Research Council (NERC) and the
US Department of Energy. Long-term support is currently provided by the
UK National Centre for Atmospheric Science (NCAS), a NERC collaborative
centre. 

The 4.01 release of the CRU TS dataset covers the period 1901-2016, and
supercedes the 4.00 release and all earlier releases. The reference to
use (see caveat after) is:

Harris, I., Jones, P.D., Osborn, T.J. and Lister, D.H. (2014), Updated
high-resolution grids of monthly climatic observations – the CRU TS3.10
Dataset. Int. J. Climatol., 34: 623–642. doi: 10.1002/joc.3711

The above reference does not, of course, describe the new process being
used for v4.xx releases. This document should be used as a guide to that
process, until a new reference is available.

This release covers the same temporal and spatial extents as v3.25,
and uses the same source databases. Users may therefore satisfy
themselves as to the differences between these two versions.

It is the intention that CRU TS will move to this process exclusively
from the next release (1901-2017) in Summer 2018.

There is currently no published reference for this version; work is in
progress to accomplish this. In the meantime the description below will
need to suffice.


1. The 4.xx Process, and how it differs from the previous process (3.xx).

The main process change in version 4 is the move to Angular Distance
Weighting (ADW) for gridding the monthly anomalies. Compared to the
previous approach, which used IDL routines TRIANGULATE and TRIGRID to
effect triangulated linear interpolation, ADW allows us total control
over how station observations are selected for gridding, and complete
traceability for every datum in the output files. For secondary
variables, this means that observed and synthesised data values are used
in the same way in the gridding process.

ADW was previously used for the production of CRU TS 2.0:

New, M., M. Hulme, and P. Jones, 2000: Representing Twentieth-Century
Space–Time Climate Variability. Part II: Development of 1901–96 Monthly
Grids of Terrestrial Surface Climate. J. Climate, 13, 2217–2238,
http://dx.doi.org/10.1175/1520-0442(2000)013<2217:RTCSTC>2.0.CO;2

The configuration for ADW in version 4 is largely based on that
described in the above paper. Some aspects of that configuration are:

• Between 1 and 8 stations contributing to a gridcell at any time step
• The exponent in the distance weighting calculation is 4
• Observations take precedence over synthetic data (where both present)
• Synthetic data stations that lie within 45° of a 'real' data station
  are not used
• Gridcells with no stations in range are set to 0 anomaly (ie the
  climatology)

All the supporting code for CRU TS is now in Fortran, improving
maintenance, speed and portability when compared to the previous mixture
of IDL and Fortran.

There have been changes to the synthetic variable generators. The most
significant of these is that variables are now synthesised for discrete
stations, whereas before they were produced as a lower-resolution (2.5°)
grid. This enables them to be used by the ADW gridding process.
Additionally, synthetic WET is now produced as an absolute number of wet
days, (it was previously converted to an anomaly). It then passes
through the same processes as regular observations and is thus
anomalised using its own 1961-90 normals. This approach may be extended
to other synthetic variables in the future.


2. Differences in output files

For now, the approach of issuing NetCDF and ASCII files in parallel will
remain, as will the publications of decadal files as well as full-length
versions. However, as stated above, this is likely to be the last
parallel release.

For version 4, station counts have changed:

• In 3.xx it was effectively impossible to know which stations had
  contributed to each datum, so two approximations were published: the
  number of stations reporting within the CDD ('.stn'), and the number
  of reporting stations in each gridcell ('.st0'). 

• In 4.00 a count of observations, (including synthetic observations
  where used), is produced by the gridder. These counts are published as
  ASCII '.stn' files, and are also embedded into the NetCDF files as a
  second variable ('stn'). It is anticipated that this will encourage
  users to use this additional information to better understand the
  dataset they are working with.


3. Differences between 3.25 and 4.01

The difference in approach affects coverage. The interpolation process
in 3.xx runs could not prevent the triangulation exceeding the defined
Correlation Decay Distance (CDD) for the variable being gridded (it
merely ensured that 'dummy' stations were inserted in unobserved
regions). In 4.00, gridcells outside the CDD of any observations (actual
or synthetic) are set to the climatology, so discs of influence with
sharp edges can appear in plots of sparsely-observed regions. This
lacks aesthetic qualities but is more justifiable; it also
highlights areas where more observations are needed. The same is true
for numerous small islands, which will have lost cover either partly or
completely.

In terms of comparisons, the extent to which 4.01 agrees with 3.25
varies with station density, unsurprisingly. Comparison plots of country
averages have been made and are included in the release.

The longstanding issue of discontinuous interpolation along the dateline
in Eastern Siberia is resolved with the new process.


4. Differences between 4.00 and 4.01

In addition to updating the dataset with 2016 data, several specific
areas have been addressed:

• A bug in the way some CLIMAT 'WET' updates were processed was detected
  and resolved; this has resulted in small changes for many areas from
  2000 onwards.

• In an effort to fill gaps in the observed record, particularly in the
  first half of the C20th, observations from GHCN-M were selectively
  added where they offered new coverage. This has resulted in some new
  and noticeable cover for TMP, TMN and TMX, and thus DTR, VAP, FRS
  (where applicable) and PET, in various parts of the world. These
  include small islands, the Amazon Basin, and areas of central Africa.
  It is interesting to see that in many cases, differences are not as
  pronounced as for 3.25 vs 3.24.01 (see the 3.25 Release Notes). This
  is likely due to distant stations (outside the CDD) being replaced
  with nearer ones for 3.25, wheras 4.00 and 4.01 would not have used
  stations outside the CDD in the first place.
  Of course, there are still gaps to be filled, especially for PRE.


Please contact me with any observations or questions. This is still a
new dataset, and I want it to evolve.

Ian Harris
NCAS-Climate
Climatic Research Unit
School of Environmental Sciences
University of East Anglia
Norwich
NR4 7TJ