Abbreviation Detection in Vietnamese Clinical Texts

Chau Vo; Tru Cao; Bao Ho

doi:10.25073/2588-1086/vnucsce.211

Chau Vo, Tru Cao, Bao Ho

PDF

Published Dec 13, 2018

DOI: https://doi.org/10.25073/2588-1086/vnucsce.211

How to Cite

VO, Chau; CAO, Tru; HO, Bao. Abbreviation Detection in Vietnamese Clinical Texts. VNU Journal of Science: Computer Science and Communication Engineering, [S.l.], v. 34, n. 2, dec. 2018. ISSN 2588-1086. Available at: <//jcsce.vnu.edu.vn/index.php/jcsce/article/view/211>. Date accessed: 25 july 2026. doi: https://doi.org/10.25073/2588-1086/vnucsce.211.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

Vol 34 No 2 (2018)

Section

Articles

Abstract

Abbreviations have been widely used in clinical notes because generating clinical notes often takes place under high pressure with lack of writing time and medical record simplification. Those abbreviations limit the clarity and understanding of the records and greatly affect all the computer-based data processing tasks. In this paper, we propose a solution to the abbreviation identification task on clinical notes in a practical context where a few clinical notes have been labeled while so many clinical notes need to be labeled. Our solution is defined with a semi-supervised learning approach that uses level-wise feature engineering to construct an abbreviation identifier, from using a small set of labeled clinical texts and exploiting a larger set of unlabeled clinical texts. A semi-supervised learning algorithm, Semi-RF, and its advanced adaptive version, Weighted Semi-RF, are proposed in the self-training framework using random forest models and Tri-training. Weighted Semi-RF is different from Semi-RF as equipped with a new weighting scheme via adaptation on the current labeled data set. The proposed semi-supervised learning algorithms are practical with parameter-free settings to build an effective abbreviation identifier for identifying abbreviations automatically in clinical texts. Their effectiveness is confirmed with the better Precision and F-measure values from various experiments on real Vietnamese clinical notes. Compared to the existing solutions, our solution is novel for automatic abbreviation identification in clinical notes. Its results can lay the basis for determining the full form of each correctly identified abbreviation and then enhance the readability of the records.

Keywords: Electronic medical record, Clinical note, Abbreviation identification, Semi-supervised learning,

Self-training, Random forest.

Article Sidebar

Article Details

Main Article Content

Abstract