Water potability classification based on hybrid stacked model and feature selection.

Journal: Environmental Science And Pollution Research International

Published: November 25, 2024

Abstract

Clean water requires accurate water quality categorization. A water potability (WP) dataset with pH, hardness, solids, chloramines, sulfate, conductivity, and other metrics for 3276 water bodies was used in this paper. After median imputation for missing values, normalization for feature scaling, and class imbalance correction using SMOTE, the Kaggle public dataset was prepared. With binary particle swarm optimization (BPSO) and binary whale optimization algorithm (BWAO), feature selection (FS) was used to determine the most important features for classification. A subset of seven essential characteristics is selected with the lowest average error of 0.3745 by the BPSO. Random forest (RF), gradient boosting (GB), support vector machine (SVM), Extra Tree (ET), decision tree (DT), and XGBoost are tested for WP prediction. The ET classifier ranked first, with 70.63% accuracy and 71.17% F1-score. Predictive performance was improved by stacking random forest, extra trees, and XGBoost base learners with Logistic Regression meta-learner. The stacking model improved with 69.53% accuracy, 70.23% F1-score, and 77.62% AUC. We found that stacking uses high-performing models to create a strong and balanced categorization framework. This paper shows that ensemble learning can improve WP categorization and that stacking may be a feasible way for measuring and managing water quality.

Authors

Ahmed Elshewey, Rasha Youssef, Hazem El Bakry, Ahmed Osman

Water potability classification based on hybrid stacked model and feature selection.

Similar Publications