ViMRC VLSP 2021: XLM-R Versus PhoBERT  on Vietnamese Machine Reading Comprehension

Nguyen Duy Nhat; Do Nguyen Thuan Phong

doi:10.25073/2588-1086/vnucsce.334

Nguyen Duy Nhat, Do Nguyen Thuan Phong

PDF

Published Dec 29, 2023

DOI: https://doi.org/10.25073/2588-1086/vnucsce.334

How to Cite

NHAT, Nguyen Duy; PHONG, Do Nguyen Thuan. ViMRC VLSP 2021: XLM-R Versus PhoBERT on Vietnamese Machine Reading Comprehension. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 39, n. 2, dec. 2023. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/334>. Date accessed: 16 july 2025. doi: https://doi.org/10.25073/2588-1086/vnucsce.334.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 39 No 2

Section

Original Articles

Abstract

The development of industry 4.0 in the world is creating challenges in Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular. Machine Reading Comprehension (MRC) is an NLP task with real-world applications that require machines to determine the correct answers to questions based on a given document. MRC systems must not only answer questions when possible but also determine when the document supports no answer and abstain from answering. In this paper, we present our proposed system to solve this task at the VLSP shared task 2021: Vietnamese Machine Reading Comprehension with UIT-ViQuAD 2.0. We present the MRC4MRC model to address that task. The model is made up of two separate components with the same automatic reading function in the MRC4MRC model. Our MRC4MRC based on the XLM-RoBERTa pre-trained language model achieves 79.13% in F1-score (F1) and 69.72% in EM (Exact Match) on the public test set. Our experiments also show that the XLM-RoBERTa language model is better than the powerful PhoBERT language model on UIT-ViQuAD 2.0.

Article Sidebar

Article Details

Main Article Content

Abstract