RELEASE NOTES FOR CRU TS v4.03: 15 May 2019 The CRU TS dataset was developed and has been subsequently updated, improved and maintained with support from a number of funders, principally the UK's Natural Environment Research Council (NERC) and the US Department of Energy. Long-term support is currently provided by the UK National Centre for Atmospheric Science (NCAS), a NERC collaborative centre. The 4.03 release of the CRU TS dataset covers the period 1901-2018. This is the fourth release of the new interpolation algorithm, which is unchanged since the previous release (v4.02). There is currently no published reference for this version; work is in progress to accomplish this. In the meantime the description below will need to suffice. 1. The current process. The main process change in v4 was the move to Angular Distance Weighting (ADW) for gridding the monthly anomalies. Compared to the v3 approach, which used IDL routines TRIANGULATE and TRIGRID to effect triangulated linear interpolation, ADW allows us total control over how station observations are selected for gridding, and complete traceability for every datum in the output files. For secondary variables, this also means that observed and synthesised data values are used in the same way in the gridding process. Initial versions of v4, v4.00 and v4.01, used a particular weighting curve to determine the contribution of each interpolant station to the target cell: exp(-1*d/cdd)^4 (where d is the distance of the interpolant station from the target cell, and cdd is the Correlation Decay Distance for the variable) However, this results in a discontinuous field when observations are sparse; this can be seen in Northern Russia, North Africa and the Amazon Basin, amongst other regions. The process introduced in v4.02 addresses both this problem and the additional fact that the original weighting was not derived at this time, but for an early version (v.20). Cross-validation was performed for all station observations that could be reconstructed, for a variety of weight curves. The weighting was used not just to combine the interpolants, but to allow the resulting values to relax 'gracefully' (rather than abruptly) to the surrounding climatology (0 in anomaly space) when conditions are sparse: this used the distance of the nearest interpolant station as the assumed source of the ADW-determined value. The weight functions explored in this way were: exp(-1*d/cdd)^m (m=1..8) 1-sin(d/cdd)^m (m=1..8) Note that the sin function expects an argument in radians, so the distance fraction was converted before use. Cross-validation was performed by attempting to reconstruct each observation where at least one other reporting station was 'in range' to allow interpolation to take place. This produced sixteen errors (found by subtracting the target value from each interpolation attempt). This process was run for the full 1901-2017 timespan, for PRE and TMP. The results showed two things: that two functions were preferred over the other twelve; and that generally, all functions performed within a narrow range. The two functions that performed best were the expotential function (m=8) and the sin function (m=1). These were approximately equal for PRE, but sin^1 was prevalent for TMP. For this reason, sin^1 was examined further. A curve of the form 1 - sin(d), where d ranges from 0 to 1 radian, falls reasonably steeply from 1 (at d = 0) before tailing out to 0 at d = 1r. This means that the discontinuities referred to above are still in evidence. For this reason, a function offering a more gradual decline, sin^4, was investigated and compared to sin^1. The additional MAE for each cell examined (calculated over time) was generally clustered near to 0, indicating that it would be an acceptable choice from the perspective of cross-validation, as well as going some way to addressing the discontinuities. This 'balancing act' is faced by all interpolations in sparsely-observed regions. The current configuration for ADW is little changed from earlier versions of v4: • Between 1 and 8 stations contributing to a gridcell at any time step • The power for the sin in the distance weighting calculation is 4 • Values relax towards the underlying climatology as distance from source increases • Observations take precedence over synthetic data (where both present) • Synthetic data stations that lie within 45° of a 'real' data station are not used • Gridcells with no stations in range are set to 0 anomaly (ie the climatology) All the supporting code for CRU TS is still in Fortran, improving maintenance, speed and portability compared with the previous mixture of IDL and Fortran. As before, synthetic variables are now synthesised for discrete stations, whereas before (in v3.xx) they were produced as a lower- resolution (2.5°) grid. This enables them to be used by the ADW gridding process. Additionally, synthetic WET is now produced as an absolute number of wet days, (it was previously converted to an anomaly). It then passes through the same processes as regular observations and is thus anomalised using its own 1961-90 normals. This approach may be extended to other synthetic variables in the future. 2. Output files For now, the approach of issuing NetCDF and ASCII files in parallel will remain, as will the publications of decadal files as well as full-length versions. However, decadal files may not be archived when superceded, in order to make best use of space. For 4.00, station counts have changed. In 3.xx it was effectively impossible to know which stations had contributed to each datum, so two approximations were published: the number of stations reporting within the CDD ('.stn'), and the number of reporting stations in each gridcell ('.st0'). In 4.00 a count of observations, (including synthetic observations where used), is produced by the gridder. These counts are published as ASCII '.stn' files, and are also embedded into the NetCDF files as a second variable ('stn'). It is anticipated that this will encourage users to use this additional information to better understand the dataset they are working with. 3. Differences between v3 and v4 The move to v4.xx has affected coverage. The interpolation process in 3.xx runs could not prevent the triangulation exceeding the defined Correlation Decay Distance (CDD) for the variable being gridded (it merely ensured that 'dummy' stations were inserted in unobserved regions). The discontinuities observed in v4.00 and v4.01 have been largely addressed (see above). Some small islands have still lost cover either partly or completely: this can only be addressed by acquiring observations closer to them. In terms of comparisons, the extent to which v4 agrees with v3 varies with station density, unsurprisingly. Comparison plots of country averages have been made and are included in the v4.02 release: comparing 4.02 with 4.01 and 3.26. The longstanding issue of discontinuous interpolation along the dateline in Eastern Siberia is resolved with v4. Please contact me with any observations or questions. I am particularly keen to hear reactions to the new weighting function. 3. The future of v3.xx releases The 3.xx 'product line' terminated with 3.26. That version remains available. As always, please contact BADC in the first instance if you have any questions, observations or suggestions. If, however, you wish to contact CRU directly about these datasets, please contact me at i.harris@uea.ac.uk, as mail to other members of the Unit will be passed through to me anyway. Ian Harris NCAS-Climate Climatic Research Unit School of Environmental Sciences University of East Anglia Norwich NR4 7TJ i.harris@uea.ac.uk