Format of CRU TS Observation files

CRU TS observation files are monolithic, with one file per variable.

The files are text files; they can be opened with any sufficiently capable
text editor.

Each station record consists of one header line, one normals line, then
a line of monthly data for each year of that station's timespan. Thus the
records are of variable length; though information in the header line,
(start and end years) allows record length to be determined. Station
records are concatenated together, one after the other.

Observations themselves are in integer format; this is a legacy format
designed to save storage space. Typically, actual observations may be
acquired by dividing by 10; the exceptions are WET days and FRS days,
which must be divided by 100.

Here is a typical station record for precipitation, showing the headers
plus the first few years of observations:

0305900  5750   -420    4 INVERNESS            UK            1781 1994
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
1781-9999-9999-9999-9999-9999-9999-9999  686  348  135-9999-9999
1782  559  439  665  765  930  559  315 1676  800  813  787   84
1783  914  432  686  216  203  787  584  889  762  635  241  279

The header line breaks down like this (format in square brackets):
0305900  5750   -420    4 INVERNESS            UK            1781 1994
_______ [i7] WMO code, in this case 03 059, packed with least-significant zeros
        _____ [i5] Latitude, in degrees x100 (this is 57.5°N)
              ______ [i6] Longitude, in degrees x100 (this is 4.2°W)
                     ____ [i4] Altitude, in m.
                          ____________________ [a20] Station Name
                                               _____________ [a13] Country Name
                                                             ____ [a4] Start Year
                                                                  ____ [a4] End Year

Each field in the header is separated by a single space, so in Fortran
terms the format is '(i7,1x,i5,1x,i6,1x,i4,1x,a20,1x,a13,2(1x,i4))'.

The normals line looks like this:
6190-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999
____ [i4] Always 6190, the normals period for CRU TS
    _____ [i5] January normal, here the missing value (-9999) is substituted
         _____ [i5] February normal, as above
              (etc)

Fields in the normals line, and subsequent data lines, are not separated
by spaces. Again, this is a legacy format to save storage space. So, in
Fortran the format is '(i4,12i5)'.

Note that the normals line is mainly unpopulated, as normals are generally
constructed from the observations at run time.

Finally, the first data line:
1781-9999-9999-9999-9999-9999-9999-9999  686  348  135-9999-9999
____ [i4] The year of observation
    _____ [i5] January observation, here the missing value (-9999) is substituted
         _____ [i5] February observation, as above
              (etc)
                                       _____ [i5] August observation, 68.6mm
                                            _____ [i5] September observation, 34.8mm
                                                 (etc)

As before, the format is '(i4,12i5)'.

Reading the files:
Start of loop
  -> read header (finish at EOF)
  -> read start + end years from header, calculate 'n. years'
  -> read normals line
  -> read 'n. years' lines of observations
End of loop

As usual, please direct any questions or bug reports to:
Ian Harris
Climatic Research Unit
School of Environmental Sciences
University of East Anglia
UK NR2 4HG
i.harris@uea.ac.uk