VLSP 2025 challenge: Numerical Reasoning Question and Answer

Le Ngoc Toan; Ha My Linh; Pham Thi Duc; Ngo The Quyen; Nguyen Thi Minh Huyen

doi:10.25073/2588-1086/vnucsce.6507

Le Ngoc Toan, Ha My Linh, Pham Thi Duc, Ngo The Quyen, Nguyen Thi Minh Huyen

PDF

Published Dec 21, 2025

DOI: https://doi.org/10.25073/2588-1086/vnucsce.6507

How to Cite

TOAN, Le Ngoc et al. VLSP 2025 challenge: Numerical Reasoning Question and Answer. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 41, n. 2, dec. 2025. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/6507>. Date accessed: 13 mar. 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.6507.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 41 No 2 (2025)

Section

Original Articles

Abstract

The VLSP 2025 Shared Task on Numerical Reasoning Question Answering (NumQA)
is the first initiative to address numerical reasoning in Vietnamese financial texts. To support this
effort, we constructed ViNumQA, a large-scale benchmark dataset comprising over 4,000 manually validated question-program-answer triples. The dataset integrates two complementary sources:
a human-verified Vietnamese translation of FinQA and newly constructed QA pairs derived from
domestic corporate financial reports. Each instance requires systems to generate a transparent mathematical reasoning program and produce a final numerical answer, enabling explicit evaluation of
both reasoning correctness and result accuracy. We established robust baselines using the LLaMA
model family and compared them against state-of-the-art proprietary LLMs (GPT-4o, GPT-5 mini).
The results demonstrate that supervised fine-tuning is essential for adherence to reasoning schemas,
as few-shot prompting strategies suffered from high invalid generation rates. The shared task included two subtasks: (1) a constrained track focusing on efficient, reproducible modeling without
external APIs, and (2) an unconstrained track allowing LLM-assisted training. The best-performing
constrained model achieved the highest in both Program and Execution Accuracy. Meanwhile, an
inference-only agent attained a highly competitive Execution Accuracy without any fine-tuning. By
releasing ViNumQA and evaluating multiple methods, this work provides a key resource for Vietnamese financial NLP and reveals the balance between interpretability and accuracy in numerical
reasoning systems.
Keywords: Numerical Reasoning, Question Answering, viNumQA, VLSP 2025, Vietnamese

Article Sidebar

Article Details

Main Article Content

Abstract