Is more data always better? A simulation study of benefits and limitations of integrated distribution models

Simmonds, Emily G. ORCID:; Jarvis, Susan G. ORCID:; Henrys, Peter A. ORCID:; Isaac, Nick J.B. ORCID:; O'Hara, Robert B. ORCID: 2020 Is more data always better? A simulation study of benefits and limitations of integrated distribution models. Ecography, 43 (10). 1413-1422.

Before downloading, please read NORA policies.
N528191JA.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview


Species distribution models are popular and widely applied ecological tools. Recent increases in data availability have led to opportunities and challenges for species distribution modelling. Each data source has different qualities, determined by how it was collected. As several data sources can inform on a single species, ecologists have often analysed just one of the data sources, but this loses information, as some data sources are discarded. Integrated distribution models (IDMs) were developed to enable inclusion of multiple datasets in a single model, whilst accounting for different data collection protocols. This is advantageous because it allows efficient use of all data available, can improve estimation and account for biases in data collection. What is not yet known is when integrating different data sources does not bring advantages. Here, for the first time, we explore the potential limits of IDMs using a simulation study integrating a spatially biased, opportunistic, presence‐only dataset with a structured, presence–absence dataset. We explore four scenarios based on real ecological problems; small sample sizes, low levels of detection probability, correlations between covariates and a lack of knowledge of the drivers of bias in data collection. For each scenario we ask; do we see improvements in parameter estimation or the accuracy of spatial pattern prediction in the IDM versus modelling either data source alone? We found integration alone was unable to correct for spatial bias in presence‐only data. Including a covariate to explain bias or adding a flexible spatial term improved IDM performance beyond single dataset models, with the models including a flexible spatial term producing the most accurate and robust estimates. Increasing the sample size of presence–absence data and having no correlated covariates also improved estimation. These results demonstrate under which conditions integrated models provide benefits over modelling single data sources.

Item Type: Publication - Article
Digital Object Identifier (DOI):
UKCEH and CEH Sections/Science Areas: Biodiversity (Science Area 2017-)
Soils and Land Use (Science Area 2017-)
ISSN: 0906-7590
Additional Information. Not used in RCUK Gateway to Research.: Open Access paper - full text available via Official URL link.
Additional Keywords: citizen science, data integration, integrated distribution models, simulations, species distribution models
NORA Subject Terms: Ecology and Environment
Date made live: 21 Jul 2020 15:50 +0 (UTC)

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...