Le Thao Minh, Dang Long Hoang, Nguyen Thanh Son, Nguyen Thi Minh Huyen, Vu Xuan Son

Main Article Content

Abstract

This paper presents VieCap4H, a grand data challenge on automatic image caption generation for the healthcare domain in Vietnamese. VieCap4H is held as part of the eighth annual workshop on Vietnamese
Language and Speech Processing (VLSP 2021). The task is considered as an image captioning task. Given a static image, mostly about healthcare-related scenarios, participants are asked to design machine learning methods to generate natural language captions in Vietnamese to describe the visual content of the image. We introduce VieCap4H, a novel human-annotated image captioning dataset in Vietnamese that contains over 10,000 image-caption pairs collected from real-world scenarios in the healthcare domain. All the models proposed by the challenge participants are evaluated using BLEU scores against groundtruths. The challenge was run on AIHUB.VN platform. Within less than two months, the challenge has attracted over 90 individual participants and recorded more than 900 valid submissions.