Water-quality data imputation with a high percentage of missing values : A machine learning approach

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/28284 Cómo citar

Título:	Water-quality data imputation with a high percentage of missing values : A machine learning approach
Autor:	Rodríguez Núñez, Rafael Pastorini, Marcos Etcheverry, Lorena Chreties, Christian Fossati, Mónica Castro, Alberto Gorgoglione, Angela
Tipo:	Artículo
Palabras clave:	Data scarcity, Water quality, Missing data, Univariate imputation, Multivariate imputation, Machine learning, Hydroinformatics
Fecha de publicación:	2021
Resumen:	The monitoring of surface-water quality followed by water-quality modeling and analysis are essential for generating effective strategies in surface-water-resource management. However, worldwide, particularly in developing countries, water-quality studies are limited due to the lack of a complete and reliable dataset of surface-water-quality variables. In this context, several statistical and machine-learning models were assessed for imputing water-quality data at six monitoring stations located in the Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. The challenge of this study is represented by the high percentage of missing data (between 50% and 70%) and the high temporal and spatial variability that characterizes the water-quality variables. The competing algorithms implement univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Hubber Regressor (HR), Support Vector Regressor (SVR) and K-nearest neighbors Regressor (KNNR)). According to the results, more than 76% of the imputation outcomes are considered “satisfactory” (NSE > 0.45). The imputation performance shows better results at the monitoring stations located inside the reservoir than those positioned along the mainstream. IDW was the model with the best imputation results, followed by RFR, HR and SVR. The approach proposed in this study is expected to aid water-resource researchers and managers in augmenting water-quality datasets and overcoming the missing data issue to increase the number of future studies related to the water-quality matter.
Descripción:	Publicación producida a partir de un Proyecto financiado por la ANII
Editorial:	MDPI
EN:	Sustainability, vol. 13, no 11, pp. 1-17, jun 2021
Citación:	Rodríguez Núñez, R., Pastorini, M., Etcheverry, L. y otros. "Water-quality data imputation with a high percentage of missing values : A machine learning approach". Sustainability. [en línea]. 2021 vol. 13, no 11, pp. 1-17. DOI: 10.3390/su13116318
ISSN:	2071-1050
Cobertura geográfica:	Río Santa Lucía Chico, Departamento de Florida, Uruguay.
Licencia:	Licencia Creative Commons Atribución (CC - By 4.0)
Aparece en las colecciones:	Publicaciones académicas y científicas - Instituto de Ingeniería Eléctrica

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
RPECFCG21.pdf	Versión publicada	3,88 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons