Source Hierarchy

From Intl Surface Temp Initiative

Before program is run, a hierarchy needs to be established to give preference to certain sources / elements. Some examples that may give a source a higher preference are

  • Use of TMAX / TMIN instead of TAVG, recognizing that Tmax and tmin biases in the record tend to be distinct.
  • More stations
  • Longer period of record
  • Data closer to original source (as raw as possible)

This can be defined by multiple organizations (ie the creation of GHCN-M might have a different preference of sources than another dataset). Nonetheless a hierarchy needs to be established for version 1.0.0 of the Databank. Another idea is to create an ensemble of results by randomly selecting the order 100 times and run the merge process each time.

Straw Man Proposal of Hierarchy

The following information was written by Peter Thorne, which overviews the current hierachy that is being considered during the merge. This is not final, and can be discussed (which is encouraged)

  • It is proposed that the priority follow the nine over-arching classes given below. The information necessary to assign each source deck to a given classification (1-9) should be readily available from the stage 2 metadata flags:
    • 1. Daily databank stage 3 (GHCN-D raw) (alternative: GHCN-D QC’ed) – this provides a backbone of max/min values that is analyzed regularly and curated carefully on an ongoing basis with regular updates and ‘stable’ resource support.
    • 2. Data sources which contain max / min data, have had no QC / homogenization applied and have known provenance
    • 3. Data sources which contain max / min data, have had no QC / homogenization applied with poorly known provenance
    • 4. Data sources that have no QC / homogenization applied but only available as Tavg and have known provenance
    • 5. Data sources that have no QC / homogenization applied but only available as Tavg and with poorly known provenance
    • 6. Data sources with QC applied and max/min data
    • 7. Data sources with QC applied and Tavg data only
    • 8. Data sources with homogenization that have max/min data
    • 9. Data sources with homogenization that have Tavg data only

I wonder about prioritizing a Tavg source above almost any source that has max/min data, i.e. should items 6) and 8) be immediately after 3). It would dilute the "purity" of the merged time series, but would contain more information about the extremes. How do others feel? Steven Worley 14 Feb. 2012."

  • Within classes 2-9 the following set of criteria would be use to differentiate between the sources in the priority order with which they should be merged:
    • I. Whether the monthly data was calculated from dailies held in the databank (give this priority as it means an investigator can dig back to data within a given month)
    • II. Whether the data arises from World Weather Records / national holdings
    • III. Average length of station record in the data deck
    • IV. Oldest station record start date / average station record start date with priority given to those with earlier start dates
    • V. Number of stations in the data deck

Current list of Sources (20120531)

  • 01 ghcnd-raw
  • 02 mexico
  • 03 vietnam
  • 04 usforts
  • 05 channel-islands
  • 06 ecuador
  • 07 pitcairnisland
  • 08 beyrout
  • 09 brazil
  • 10 argentina
  • 11 greenland
  • 12 wwr
  • 13 colonialera
  • 14 east-africa
  • 15 uganda
  • 16 climat-uk
  • 17 antarctica-southpole
  • 18 ispd-swiss
  • 19 ispd-ipy
  • 20 ispd-sydney
  • 21 antarctica-scar-reader
  • 22 mcdw
  • 23 spain
  • 24 russia
  • 25 uruguay-inia
  • 26 swiss-digihom
  • 27 ispd-tunisia-morocco
  • 28 ecaknmi
  • 29 sacaknmi
  • 30 japan
  • 31 ukmet-hist
  • 32 knmi
  • 33 russsource
  • 34 ghcnsource
  • 35 wmssc
  • 36 ghcnmv2
  • 37 central-asia
  • 38 canada
  • 39 australia
  • 40 arctic
  • 41 histalp
  • 42 crutem4
  • 43 gsod
Personal tools