Pham Duy Trung, Pham Ngoc Minh, Bui Thu Lam

Main Article Content

Abstract

Abstract: Deployment of deep learning–based misinformation detection systems frequently suffers
from severe resource contention when heterogeneous computational workloads share the same hardware. CPU-intensive preprocessing operations, such as video decoding and face extraction, interfere
with GPU-intensive neural network inference, leading to GPU underutilization and Head-of-Line
blocking in monolithic architectures. This paper introduces DeepSentry, a microservice-based detection platform designed to mitigate this bottleneck through asymmetric architectural decoupling.
In this context, asymmetry is defined as a non-uniform architectural decomposition in which CPUbound preprocessing services and GPU-bound inference services are intentionally isolated and scaled
independently, rather than replicated symmetrically as identical service instances. By separating
CPU-bound ingestion services from GPU-bound inference engines via message queuing and an inference server, the proposed platform enables independent scaling and asynchronous execution of
heterogeneous resources. A proof-of-concept implementation integrating ResNet34 for image deepfake detection with heatmap generation and BiLSTM for fake news classification demonstrates that
asymmetric microservice architectures significantly improve hardware utilization through dynamic
batching and asynchronous orchestration, effectively resolving the resource contention inherent in
monolithic deployments.
Keywords: Resource contention, GPU starvation, Microservices, NVIDIA Triton, Deepfake
detection, Fake news detection.