Dang Van Nhan, Nguyen Le Minh

Main Article Content

Abstract

Nowadays, the amount of information has become huge, and our task is to find the correct answers to the questions. In fact, not every question has an answer, and then the best answer should be don’t know, where the model that makes the prediction is the empty string. Building a high-accuracy response model will make people’s lives easier. We have the SQuAD dataset for English that helps train the machine reading comprehension model. Based on SQuAD 2.0, the organizing committee developed the Vietnamese Question Answering Dataset UIT-ViQuAD 2.0 [1], a reading comprehension dataset consisting of questions posed by crowd-workers on a set of Wikipedia Vietnamese articles. The UIT-ViQuAD 2.0 dataset evolved from version 1.0 with the difference that version 2.0 contained answerable and unanswerable questions. The challenge of this problem [2] is to distinguish between answerable and unanswerable questions. The answer to every question is a span of text from the corresponding reading passage, or the question might be unanswerable. Our system employs simple yet highly effective methods. The system uses a pre-trained language model (PLM) called XLM-RoBERTa (XLM-R [3]), combined with filtering results from multiple output files to produce the final result. We created about 5-7 output files and selected the most repetitions as the final prediction answer. After filtering, our system increased from 75.172% to 76.386% at the F1 measure and achieved 65,329% in the EM measure on the Private Test set,…