Handling missing values in trait data
Johnson, Thomas F.; Isaac, Nick J.B. ORCID: https://orcid.org/0000-0002-4869-8052; Paviolo, Agustin; González‐Suárez, Manuela. 2021 Handling missing values in trait data. Global Ecology and Biogeography, 30 (1). 51-62. https://doi.org/10.1111/geb.13185
Before downloading, please read NORA policies.
|
Text
N529220JA.pdf - Published Version Available under License Creative Commons Attribution 4.0. Download (1MB) | Preview |
Abstract/Summary
Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete‐case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete‐case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors.
Item Type: | Publication - Article |
---|---|
Digital Object Identifier (DOI): | https://doi.org/10.1111/geb.13185 |
UKCEH and CEH Sections/Science Areas: | Biodiversity (Science Area 2017-) |
ISSN: | 1466-822X |
Additional Information. Not used in RCUK Gateway to Research.: | Open Access paper - full text available via Official URL link. |
Additional Keywords: | BHPMF, functional trait, imputation, life-history trait, MAR, MCAR, missing data, MNAR, multiple imputation chained equations, Rphylopars |
NORA Subject Terms: | Ecology and Environment |
Date made live: | 18 Dec 2020 11:56 +0 (UTC) |
URI: | https://nora.nerc.ac.uk/id/eprint/529220 |
Actions (login required)
View Item |
Document Downloads
Downloads for past 30 days
Downloads per month over past year