SV - VLSP 2021: Combine Attentive Statistical Pooling-based Xvector and Pretrained ECAPA-TDNN for Vietnamese Text-Independent Speaker Verification
Main Article Content
Abstract
Recently, Xvectors and ECAPA-TDNN have been considered state-of-the-art models in designing speaker verification systems. This paper proposes a novel approach that combines Attentive statistic pooling-based Xvector and pre-trained ECAPA-TDNN for Vietnamese speaker verification. Experiments are conducted on various recent Vietnamese speech datasets. The results portrayed that our proposed combination outperformed all constitutive models with 4% to 37% relative EER improvement and ranked second place in Task 2 of the 2021 VLSP Speaker Verification competition.