SV - VLSP 2021: Combine Attentive Statistical Pooling-based Xvector and Pretrained ECAPA-TDNN for Vietnamese Text-Independent Speaker Verification

Ta Bao Thang; Huynh Thi Thanh Binh

doi:10.25073/2588-1086/vnucsce.320

Ta Bao Thang, Huynh Thi Thanh Binh

PDF

Published Jun 30, 2022

DOI: https://doi.org/10.25073/2588-1086/vnucsce.320

How to Cite

THANG, Ta Bao; BINH, Huynh Thi Thanh. SV - VLSP 2021: Combine Attentive Statistical Pooling-based Xvector and Pretrained ECAPA-TDNN for Vietnamese Text-Independent Speaker Verification. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 38, n. 1, june 2022. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/320>. Date accessed: 26 july 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.320.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 38 No 1: Special Issue: The 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021)

Section

Special Issue on Vietnamese Language and Speech Processing (VLSP2021)

Abstract

Recently, Xvectors and ECAPA-TDNN have been considered state-of-the-art models in designing speaker verification systems. This paper proposes a novel approach that combines Attentive statistic pooling-based Xvector and pre-trained ECAPA-TDNN for Vietnamese speaker verification. Experiments are conducted on various recent Vietnamese speech datasets. The results portrayed that our proposed combination outperformed all constitutive models with 4% to 37% relative EER improvement and ranked second place in Task 2 of the 2021 VLSP Speaker Verification competition.

Article Sidebar

Article Details

Main Article Content

Abstract