Missing data in spatiotemporal datasets: the UK rainfall chemistry network

Cape, J.N.; Smith, R.I.; Leaver, D.. 2015 Missing data in spatiotemporal datasets: the UK rainfall chemistry network. Geoscience Data Journal, 2 (1). 25-30.

Before downloading, please read NORA policies.
N509800JA.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (237kB) | Preview


Rainfall chemistry networks inevitably report some missing data, caused by contamination or loss of samples. However, there are no universally accepted rules about how such data, particularly from samples contaminated in the field, are identified and reported, leading to uncertainties in data usage by third parties, and possible incorrect inferences based on the reported data. This paper describes how the UK rainfall chemistry network data have been analysed for contamination, and how missing values can be estimated based on cross-correlations in time and space, using data from 20 sites over 26 years. The final flagged dataset is available through the CEH Environmental Information Data Centre (EIDC). Erroneous data values are identified through consideration of ion balance (internal consistency), and evidence of contamination by birds or windblown dust based on the reported chemical analysis. Overall data capture with the erroneous data excluded and no replacement of missing data was 86%, but with much smaller data capture at some sites in some years, to less than 30% in some cases. The use of estimated data to replace missing values resulted in an increase in overall data capture to 96%, with only one site having data capture less than 70% in an individual year, and all sites achieving a data capture of 88% or more over the full period. The implications of using the reported ‘official’ annual data, as opposed to the dataset with missing values replaced by estimates, are illustrated by consideration of the temporal trend in nitrate at one site, which shows twice the value in the ‘official’ reported annual data compared with the ‘estimated’ data, part of a consistent pattern across all sites. Use of the uncorrected ‘raw’ sample data leads to large errors.

Item Type: Publication - Article
Digital Object Identifier (DOI):
UKCEH and CEH Sections/Science Areas: UKCEH Fellows
ISSN: 2049-6060
Additional Information. Not used in RCUK Gateway to Research.: Open Access paper - full text available via Official URL link
Additional Keywords: missing values, temporal trends, contaminated data
NORA Subject Terms: Atmospheric Sciences
Data and Information
Date made live: 01 Apr 2015 10:52 +0 (UTC)

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...