HMEM: A Hybrid Meta-Ensemble Model for Early Prediction of Student Dropout
Main Article Content
Abstract
Abstract: Student dropout is a significant concern in higher education, posing challenges for institutional performance and long-term learner success. While various machine learning models have
been applied to address this issue, many approaches face limitations in handling feature heterogeneity, class imbalance, and model fusion. In this study, we present the Hybrid Meta-Ensemble Model
(HMEM), a modular predictive pipeline that combines three gradient boosting learners (CatBoost,
LightGBM, XGBoost), probabilistic output enrichment using statistical descriptors (mean, standard
deviation, entropy), SMOTE-based meta-level resampling, and a final classification layer using TabTransformer.
We evaluate HMEM on the UCI Student Performance dataset under two scenarios—with and
without SMOTE—to examine the impact of meta-level balancing. Experimental results show that the
full HMEM pipeline with SMOTE achieves consistent improvements across key metrics: Accuracy
(0.9384), Precision (0.9412), Recall (0.9412), F1-score (0.9412), and AUC (0.9539). Compared to
both the base learners and the meta-ensemble without SMOTE, the proposed approach demonstrates
moderate but systematic gains, particularly in detecting minority-class instances. Ablation studies further indicate that probabilistic enrichment, feature partitioning, and SMOTE each contribute
meaningfully to performance, and additional baselines (Logistic Regression and simple stacking)
confirm that HMEM offers a more favourable balance between discrimination ability and minorityclass coverage. Visual analysis of ROC curves and enriched feature distributions corroborates these
findings.
Keywords: Student Dropout Prediction, Ensemble Learning, TabTransformer, SMOTE,
Educational Data Mining.