Analysis of Diabetes Classification Performance Improvement Using Ensemble Bagging and K-Fold

Mawardi Kudin; Abd Salam At Taqwa; Angga Kurniawan; Chairi Nur Insani

doi:10.37396/jsc.v9i1.630

Mawardi Kudin Universitas Sulawesi Barat
Abd Salam At Taqwa Universitas Negeri Makassar
Angga Kurniawan Telkom University
Chairi Nur Insani Universitas Sulawesi Barat

DOI: https://doi.org/10.37396/jsc.v9i1.630

Keywords: Diabetes Mellitus, Machine Learning, Ensemble Bagging, K-Fold Cross Validation, Classification

Abstract

Diabetes mellitus represents a long-term metabolic disorder whose global incidence continues to rise, making precise early identification essential to minimize severe complications. Machine learning techniques have been extensively utilized for diabetes classification; however, single-model approaches often suffer from performance constraints, such as susceptibility to overfitting and high variability in prediction outcomes. To address these challenges, this research introduces a bagging-based ensemble learning strategy integrated with K-Fold Cross Validation to enhance both predictive accuracy and model robustness. The study employs the Pima Indians Diabetes Dataset, which contains 768 patient records described by eight clinical features and one outcome variable. Eight classification methods—Logistic Regression, K-Nearest Neighbors, Support Vector Machine, Decision Tree, Random Forest, Naïve Bayes, Gradient Boosting, and XGBoost—were assessed individually and within the proposed ensemble framework. Model effectiveness was measured using accuracy, precision, recall, and F1-score derived from the confusion matrix. The findings indicate that the ensemble bagging approach generally strengthens model stability and yields improvements in accuracy and precision across most algorithms. Notably, K-Nearest Neighbors and XGBoost demonstrated the most stable gains following ensemble integration. Nevertheless, enhancements in precision were frequently associated with a reduction in recall, reflecting a trade-off in identifying positive cases. In summary, the integration of bagging and K-Fold Cross Validation provides a more resilient and dependable classification model, offering strong potential for supporting clinical decision-making in early diabetes detection.

Downloads

Download data is not yet available.

	Editorial Team
	Reviewers Acknowledgement
	Focus and Scope
	Publication Ethics
	Online Submissions
	Author Guidelines
	Revision Guidelines
	Template Paper
	Call Papers
	Citedness in Scopus
	APC
	Journal History
	Indexing
	Contact Us

Analysis of Diabetes Classification Performance Improvement Using Ensemble Bagging and K-Fold

Abstract

Downloads

Partnership