HMEM: A Hybrid Meta-Ensemble Model for Early Prediction of Student Dropout

Thi- Van Nguyen; Thi- Ngoc- Anh Vu; Van- Binh Ngo

doi:10.25073/2588-1086/vnucsce.6948

Thi- Van Nguyen, Thi- Ngoc- Anh Vu, Van- Binh Ngo

PDF

Published Mar 13, 2026

DOI: https://doi.org/10.25073/2588-1086/vnucsce.6948

How to Cite

NGUYEN, Thi- Van; ANH VU, Thi- Ngoc-; NGO, Van- Binh. HMEM: A Hybrid Meta-Ensemble Model for Early Prediction of Student Dropout. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 42, n. 1, mar. 2026. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/6948>. Date accessed: 08 july 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.6948.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 42 No 1 (2026)

Section

Original Articles

Abstract

Abstract: Student dropout is a significant concern in higher education, posing challenges for institutional performance and long-term learner success. While various machine learning models have
been applied to address this issue, many approaches face limitations in handling feature heterogeneity, class imbalance, and model fusion. In this study, we present the Hybrid Meta-Ensemble Model
(HMEM), a modular predictive pipeline that combines three gradient boosting learners (CatBoost,
LightGBM, XGBoost), probabilistic output enrichment using statistical descriptors (mean, standard
deviation, entropy), SMOTE-based meta-level resampling, and a final classification layer using TabTransformer.
We evaluate HMEM on the UCI Student Performance dataset under two scenarios—with and
without SMOTE—to examine the impact of meta-level balancing. Experimental results show that the
full HMEM pipeline with SMOTE achieves consistent improvements across key metrics: Accuracy
(0.9384), Precision (0.9412), Recall (0.9412), F1-score (0.9412), and AUC (0.9539). Compared to
both the base learners and the meta-ensemble without SMOTE, the proposed approach demonstrates
moderate but systematic gains, particularly in detecting minority-class instances. Ablation studies further indicate that probabilistic enrichment, feature partitioning, and SMOTE each contribute
meaningfully to performance, and additional baselines (Logistic Regression and simple stacking)
confirm that HMEM offers a more favourable balance between discrimination ability and minorityclass coverage. Visual analysis of ROC curves and enriched feature distributions corroborates these
findings.
Keywords: Student Dropout Prediction, Ensemble Learning, TabTransformer, SMOTE,
Educational Data Mining.

Article Sidebar

Article Details

Main Article Content

Abstract