Predictive geochemical mapping using machine learning in western Kenya

Humphrey, Olivier S.; Cave, Mark; Hamilton, Elliott M.; Osano, Odipo; Menya, Diana; Watts, Michael J.. 2023 Predictive geochemical mapping using machine learning in western Kenya. Geoderma Regional, 35, e00731.

Before downloading, please read NORA policies.
Text (Open Access Paper)
1-s2.0-S235200942300127X-main.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (5MB) | Preview


Digital soil mapping techniques represent a cost-effective method for obtaining detailed information regarding the spatial distribution of chemical elements in soils. Machine learning (ML) algorithms using random forest (RF) models have been developed for classification, pattern recognition and regression tasks, they are capable of modelling non-linear relationships using a range of datasets, identifying hierarchical relationships, and determining the importance of predictor variables. In this study, we describe a framework for spatial prediction based on RF modelling where inverse distance weighted (IDW) predictors are used in conjunction with ancillary environmental covariates. The model was applied to predict the total concentration (mg kg−1) and assess the prediction uncertainty of 56 elements, soil pH and organic matter content using 466 soil samples in western Kenya; the results of iodine (I), selenium (Se), zinc (Zn) and soil pH are highlighted in this work. These elements were selected due to contrasting biogeochemical cycles and widespread dietary deficiencies in sub-Saharan Africa, whilst soil pH is an important parameter controlling soil chemical reactions. Algorithm performance was evaluated determining the relative importance of each predictor variable and the model's response using partial dependence profiles. The accuracy and precision of each RF model were assessed by evaluating out-of-bag predicted values. The models R2 values range from 0.31 to 0.64 whilst CCC values range from 0.51 to 0.77. The IDW predictor variables had the greatest impact on assessing the distribution of soil properties in the study area, however, the inclusion of ancillary environmental data improved model performance for all soil properties. The results presented in this paper highlight the benefits of ML algorithms which can incorporate multiple layers of data for spatial prediction, uncertainty assessment and attributing variable importance. Additional research is now required to ensure health practitioners and the agri-community utilise the geochemical maps presented here for assessing the relationship between environmental geochemistry, endemic diseases and preventable micronutrient deficiency.

Item Type: Publication - Article
Digital Object Identifier (DOI):
ISSN: 23520094
Additional Keywords: IGRD
Date made live: 04 Dec 2023 14:33 +0 (UTC)

Actions (login required)

View Item View Item

Document Downloads

Downloads for past 30 days

Downloads per month over past year

More statistics for this item...