Contrastive 3D Multimodal Feature Fusion for Abnormal Behavior Recognition in Low-Light Conditions
Main Article Content
Abstract
Abstract: Surveillance in low-light conditions faces challenges of poor visibility and limited computational resources for deployment. This paper presents a 3D multimodal feature fusion framework
with contrastive learning for abnormal human behavior recognition. Our approach extracts spatiotemporal features from visible and thermal videos, using a weighted assembly strategy to fuse the
most informative regions. Contrastive learning pre-trains the backbone model to enhance recognition
performance. Experiments on the Thermal-LLAB dataset demonstrate that the backbone AMFCFB
model achieves a recognition accuracy of 96.03% and a detection AUC of 92.60%. Contrastive
learning pre-training yields a 5.3 percentage point improvement in detection accuracy over training
from scratch. These results confirm that combining thermal and visible modalities under contrastive
pre-training yields a practical framework for 24/7 surveillance in challenging lighting conditions.
Additionally, we release Thermal-LLAB (Low-Light Anomalous Behavior Dataset) — a new collection of synchronized visible and thermal videos capturing abnormal behaviors in low-light indoor
and outdoor environments — to support future research.
Keywords: Abnormal Behavior Detection, Abnormal Behavior Recognition, Contrastive Learning,
Multimodal Feature Fusion, Thermal and Visible Images, Low-light Surveillance Video.