vnNLI - VLSP2021: An Empirical Study on Vietnamese-English Natural Language Inference Based on Pretrained Language Models with Data Augmentation

Ngo Dinh Luan; Ngo Le Hieu Kien; Dang Van Thin; Duong Ngoc Hao; Nguyen Luu Thuy Ngan

doi:10.25073/2588-1086/vnucsce.330

Ngo Dinh Luan, Ngo Le Hieu Kien, Dang Van Thin, Duong Ngoc Hao, Nguyen Luu Thuy Ngan

PDF

Published Dec 16, 2022

DOI: https://doi.org/10.25073/2588-1086/vnucsce.330

How to Cite

LUAN, Ngo Dinh et al. vnNLI - VLSP2021: An Empirical Study on Vietnamese-English Natural Language Inference Based on Pretrained Language Models with Data Augmentation. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 38, n. 2, dec. 2022. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/330>. Date accessed: 14 oct. 2025. doi: https://doi.org/10.25073/2588-1086/vnucsce.330.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 38 No 2: Special Issue: The 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021)

Section

Special Issue on Vietnamese Language and Speech Processing (VLSP2021)

Abstract

In this paper, we describe an empirical study of data augmentation techniques with various pre-trained language models on the bilingual dataset which was presented at the VLSP 2021 - Vietnamese and English-Vietnamese Textual Entailment. We apply the machine translation tool to generate new training set from original training data and then investigate and compare the effectiveness of a monolingual and multilingual model on the new data set. Our experimental results show that fine-tuning a pre-trained multilingual language XLM-R model with an augmented training set gives the best performance. Our system was ranked third in the shared-task VLSP 2021 with the F1-score of about 0.88.

Article Sidebar

Article Details

Main Article Content

Abstract