Home > Published Issues > 2022 > Volume 13, No. 5, October 2022 >
JAIT 2022 Vol.13(5): 470-476
doi: 10.12720/jait.13.5.470-476

ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation

Mustafa Alabadla, Fatimah Sidi, Iskandar Ishak, Hamidah Ibrahim, Lilly Suriani Affendey, and Hazlina Hamdan
Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Selangor D.E., Malaysia

Abstract—Missing values are one of the common incidences that occurs in healthcare datasets. Its existence usually leads to undesirable results while conducting data analysis using machine learning methods. Recently, researchers have proposed several imputation approaches to deal with missing values in real-world datasets. Moreover, data imputation assists us to build a high-performance machine learning models to discover patterns in healthcare data that provides top-notch insights for a higher quality decision-making. In this paper, we propose a new imputation approach using Extremely Randomized Trees (Extra Trees) of machine learning ensemble learning methods named (ExtraImpute) to tackle numerical missing values in healthcare context. The proposed method has the ability to impute both continuous and discrete data features. This approach imputes each missing value that exists in features by predicting its value using other observed values in the dataset. To evaluate the efficiency of our algorithm, several experiments are conducted on five different benchmark healthcare datasets and compared to other commonly used imputation methods, viz. missForest, KNNImpute, Multivariate Imputation by Chained Equations (MICE), and SoftImpute. The results were validated using Root Mean Square Error (RMSE) and Coefficient of Determination (and Coefficient of Determination (R2) scores. From these results,it was observed that our proposed algorithm outperforms existing imputation techniques.
Index Terms—imputation, missing values, extra trees, healthcare

Cite: Mustafa Alabadla, Fatimah Sidi, Iskandar Ishak, Hamidah Ibrahim, Lilly Suriani Affendey, and Hazlina Hamdan, "ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation," Journal of Advances in Information Technology, Vol. 13, No. 5, pp. 470-476, October 2022.

Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.