Phan Xuan Phuc, Nguyen Ky Tung, Nguyen Hai Duong, Duong Minh Tri

Main Article Content

Abstract

Machine reading comprehension (MRC) is a challenging Natural Language Processing (NLP) research field
and wide real-world applications. The great progress of this field in recents is mainly due to the emergence of
few datasets for machine reading comprehension tasks with large sizes and deep learning. For the Vietnamese
language, some datasets, such as UIT-ViQuAD [1] and UIT-ViNewsQA [2], most recently, UIT-ViQuAD 2.0 [3] - a
dataset of the competitive VLSP 2021-MRC Shared Task 1 . MRC systems must not only answer questions when
necessary but also tactfully abstain from answering when no answer is available according to the given passage.
In this paper, we proposed two types of joint models for answerability prediction and pure-MRC prediction with/
without a dependency mechanism to learn the correlation between a start position and end position in pure-MRC
output prediction. Besides, we use ensemble models and a verification strategy by voting the best answer from the
top K answers of different models. Our proposed approach is evaluated on the benchmark VLSP 2021-MRC Shared
Task challenge dataset UIT-ViQuAD 2.0 [3] shows that our approach is significantly better than the baseline.