A systematic review of machine learning models for groundwater level prediction

Jesse, Gilbert; Boateng, Cyril D.; Aryee, Jeffrey N.A.; Osei, Marian A.; Wemegah, David D.; Gidigasu, Solomon S.R.; Britwum, Akyana; Afful, Samuel K.; Touré, Haoulata; Mensah, Vera; Owusu-Afriyie, Prinsca

Jesse, Gilbert; Boateng, Cyril D. ORCID: https://orcid.org/0000-0002-1721-4158; Aryee, Jeffrey N.A.; Osei, Marian A. ORCID: https://orcid.org/0000-0003-3481-7222; Wemegah, David D.; Gidigasu, Solomon S.R.; Britwum, Akyana; Afful, Samuel K.; Touré, Haoulata; Mensah, Vera; Owusu-Afriyie, Prinsca. 2025 A systematic review of machine learning models for groundwater level prediction. Applied Computing and Geosciences, 28, 100303. 29, pp. 10.1016/j.acags.2025.100303

[A][B][+][-]

Abstract

This study presents a comprehensive synthesis of machine learning (ML) techniques applied to groundwater level (GWL) prediction, focusing on model architectures, feature selection methods, hyperparameter tuning, optimization algorithms, and clustering techniques. A total of 223 peer-reviewed articles were systematically reviewed using the PRISMA framework to guide study identification, inclusion, and exclusion. Widely used models include artificial neural networks (ANN), support vector machines (SVM), long short-term memory networks (LSTM), and random forests (RF). More recent studies increasingly employ hybrid approaches that integrate wavelet transforms, signal decomposition, and optimization techniques such as particle swarm optimization (PSO), genetic algorithms (GA), and ant colony optimization (ACO). Transformer-based models have also begun to emerge as promising tools in this domain. A central focus of this review is feature selection, which remains one of the most underdeveloped areas in GWL modeling. Most studies rely on simple filter methods like autocorrelation and mutual information. While SHapley Additive exPlanations (SHAP) has gained some traction, more advanced techniques, such as recursive feature elimination (RFE), forward feature selection (FFS), factor analysis (FA), and self-organizing maps (SOM), are rarely used. Notably, no study systematically compared multiple feature selection strategies, limiting insights into their impact on model performance. Scientometric analysis shows that Iran, China, India, and the United States contribute the most impactful research. Despite strong predictive outcomes, trial-and-error remains the dominant approach to hyperparameter tuning. The review emphasizes the need for more systematic, interpretable, and generalizable ML approaches to support robust groundwater level (GWL) forecasting.

Documents