Tuan Phong Nguyen, Quoc Tuan Truong, Xuan Nam Nguyen, Anh Cuong Le

Main Article Content

Abstract

Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many other NLP tasks such as named entity recognition, syntactic parsing and text chunking. Recent studies for common languages such as English and French gain very high precision for this core NLP task. However, current results for less common language like Vietnamese are not as good as for those languages. In our investigation, we utilized the techniques of two widely used toolkits, ClearNLP and Stanford POS Tagger, and made two new taggers to compare with three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. We created a unique evaluation scheme to make a systematic comparison and investigate that which tagger has the best accuracy and speed. The comparison revealed that our two new taggers built from ClearNLP and Stanford POS Tagger make overall accuracies of 94.50% and 94.39%, respectively, and outperform all other toolkits. Moreover, in our speed testing, RDRPOSTagger produces an impressive tagging speed and performs signiï¬cantly faster than any other stochastic taggers.