Contrastive 3D Multimodal Feature Fusion for Abnormal Behavior Recognition in Low-Light Conditions

Hoai-Duy Nguyen; Van-Dong Huynh; Cuong D.H. Tran; Tien-Dung Cao

doi:10.25073/2588-1086/vnucsce.7053

Hoai-Duy Nguyen, Van-Dong Huynh, Cuong D.H. Tran, Tien-Dung Cao

PDF

Published Jun 25, 2026

DOI: https://doi.org/10.25073/2588-1086/vnucsce.7053

How to Cite

NGUYEN, Hoai-Duy et al. Contrastive 3D Multimodal Feature Fusion for Abnormal Behavior Recognition in Low-Light Conditions. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], june 2026. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/7053>. Date accessed: 27 july 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.7053.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Article in Press

Section

Original Articles

Abstract

Abstract: Surveillance in low-light conditions faces challenges of poor visibility and limited computational resources for deployment. This paper presents a 3D multimodal feature fusion framework
with contrastive learning for abnormal human behavior recognition. Our approach extracts spatiotemporal features from visible and thermal videos, using a weighted assembly strategy to fuse the
most informative regions. Contrastive learning pre-trains the backbone model to enhance recognition
performance. Experiments on the Thermal-LLAB dataset demonstrate that the backbone AMFCFB
model achieves a recognition accuracy of 96.03% and a detection AUC of 92.60%. Contrastive
learning pre-training yields a 5.3 percentage point improvement in detection accuracy over training
from scratch. These results confirm that combining thermal and visible modalities under contrastive
pre-training yields a practical framework for 24/7 surveillance in challenging lighting conditions.
Additionally, we release Thermal-LLAB (Low-Light Anomalous Behavior Dataset) — a new collection of synchronized visible and thermal videos capturing abnormal behaviors in low-light indoor
and outdoor environments — to support future research.
Keywords: Abnormal Behavior Detection, Abnormal Behavior Recognition, Contrastive Learning,
Multimodal Feature Fusion, Thermal and Visible Images, Low-light Surveillance Video.

Article Sidebar

Article Details

Main Article Content

Abstract