Dang Dinh Son, Le Dang Linh, Dang Xuan Vuong, Duong Quang Tien, Ta Bao Thang

Main Article Content

Abstract

Recent years have witnessed the strong growth of Automatic Speech Recognition (ASR) studies due to its wide range of applications. However, there are few efforts put into the Vietnamese language. This paper introduces an end-to-end approach using Conformer and pseudo labeling for Vietnamese ASR systems. Besides, our approach is equipped with Gradient Mask and Stochastic Weight Averaging method to improve the training performance.  The experiment results portrayed that our method achieved the best performance (8.28% Syllable Error Rate) and outperformed all other competitors in Task 1 of the 2021 VLSP Competition on Vietnamese Automatic Speech Recognition.