Home > Published Issues > 2026 > Volume 17, No. 1, 2026 >
JAIT 2026 Vol.17(1): 122-132
doi: 10.12720/jait.17.1.122-132

Mapping Continental Water Bodies in the Peruvian Andes Using Machine Learning and Sentinel-2 Imagery

José Sulla-Torres 1,*, Luis Barrios-Lipa 1, Bryan Toribio-Obando 1, Enrique Zúñiga-Portilla 2, Manuel Zúñiga-Carnero 1, Karina Rosas-Paredes 1, and Gwendolyn Peyre 2
1. Professional School of Systems Engineering, Catholic University of Santa Maria, Peru
2. Department of Civil and Environmental Engineering, University of the Andes, Colombia
Email: jsullato@ucsm.edu.pe (J.S.T.); luis.barriosl@ucsm.edu.pe (L.B.L.); bryan.toribio@ucsm.edu.pe (B.T.O.); e.zuniga@uniandes.edu.co (E.Z.P.); mzunigac@ucsm.edu.pe (M.Z.C.); kparedes@ucsm.edu.pe (K.R.P.); gf.peyre@uniandes.edu.co (G.P.);
*Corresponding author

Manuscript received July 3, 2025; revised September 14, 2025; accepted October 27, 2025; published January 15, 2026.

Abstract—Identifying and characterizing inland water bodies is essential for water resource management and ecological monitoring, especially in dry and drought-prone areas. In this study, we applied land cover classification through remote sensing imagery in the southern Andean region of Arequipa, Peru. Machine learning algorithms were employed to identify inland water bodies within a heterogeneous landscape, facilitating their monitoring, conservation, and sustainable management. The study used the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology, which comprises six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Sentinel-2 multispectral satellite imagery was processed. Three supervised machine learning algorithms (Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN)) were trained and tested to classify landscape features into distinct categories and map inland water bodies. Cross-validation and random sampling validation points were used to evaluate model performance. Among the tested models, KNN showed the best performance in detecting lakes and lagoons, with an F1-Score of 0.741, an Overall Accuracy (OA) of 0.409, and a Kappa coefficient (κ) of 0.248, outperforming RF and SVM for this specific task. The results demonstrate the feasibility of using high-accuracy machine learning techniques to classify and map inland water bodies, supporting informed decision-making for the conservation and sustainable management of aquatic ecosystems in the Arequipa region.
 
Keywords—machine learning, water bodies mapping, remote sensing, Sentinel-2, geospatial analysis, continental water bodies, K-nearest neighbor, random forest

Cite: José Sulla-Torres, Luis Barrios-Lipa, Bryan Toribio-Obando, Enrique Zúñiga-Portilla, Manuel Zúñiga-Carnero, Karina Rosas-Paredes, and Gwendolyn Peyre, "Mapping Continental Water Bodies in the Peruvian Andes Using Machine Learning and Sentinel-2 Imagery," Journal of Advances in Information Technology, Vol. 17, No. 1, pp. 122-132, 2026. doi: 10.12720/jait.17.1.122-132

Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions