Phuong Tuan Dat, Hoang Long Vu, Nguyen Thi Thu Trang

Main Article Content

Abstract

The Vietnamese Spoofing-Aware Speaker Verification (VSASV) Challenge series represents the first systematic effort to advance spoof-resistant speaker verification for Vietnamese - a
low-resource, highly tonal language characterized by rich phonetic variability. Unlike prior challenges focused on English, VSASV directly addresses the scarcity of publicly available Vietnamese
spoofing corpora, a limitation that historically hindered the development of robust automatic speaker
verification (ASV) and spoofing countermeasure (CM) systems. Across its 2023 and 2025 editions,
VSASV introduces progressively more challenging benchmarks, including multi-corpus bonafide
speech, replay attacks, neural voice conversion, modern TTS synthesis, and adversarial perturbations. The 2025 edition further incorporates a speaker-similarity-based partitioning strategy and
severe train–test mismatches to emulate realistic attack scenarios. Results from more than 40 participating systems highlight the feasibility of building reliable spoofing-aware ASV pipelines under
low-resource conditions, particularly when combining ASV and CM subsystems or leveraging multilingual self-supervised learning (SSL) models. The findings underscore the importance of linguistic
properties - especially tonal dynamics - in shaping spoofing vulnerabilities and model generalization.
This work provides a comprehensive overview of the VSASV challenge series, synthesizing insights
that inform future research on deepfake detection, spoof-robust speech authentication, and inclusive
biometric technologies for underrepresented languages.
Keywords: Deepfake Detection, Speaker Verification, Low-resource Languages, Vietnamese
Speech Datasets