vnNLI - VLSP 2021: Vietnamese and English-Vietnamese Textual Entailment Based on Pre-trained  Multilingual Language Models

Hoang Xuan Vu; Nguyen Van Tai; Phan Thi Kim Khoa; Dang Van Thin; Duong Ngoc Hao; Nguyen Luu Thuy Ngan

doi:10.25073/2588-1086/vnucsce.329

Hoang Xuan Vu, Nguyen Van Tai, Phan Thi Kim Khoa, Dang Van Thin, Duong Ngoc Hao, Nguyen Luu Thuy Ngan

PDF

Published Dec 16, 2022

DOI: https://doi.org/10.25073/2588-1086/vnucsce.329

How to Cite

VU, Hoang Xuan et al. vnNLI - VLSP 2021: Vietnamese and English-Vietnamese Textual Entailment Based on Pre-trained Multilingual Language Models. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 38, n. 2, dec. 2022. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/329>. Date accessed: 25 july 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.329.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 38 No 2: Special Issue: The 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021)

Section

Special Issue on Vietnamese Language and Speech Processing (VLSP2021)

Abstract

Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.

Article Sidebar

Article Details

Main Article Content

Abstract