Duc Tran Duong, Son Bao Pham, Hanh Tan

Main Article Content

Abstract

In this paper, we investigate the author profiling task for Vietnamese forum posts to predict demographic attributes, such as gender, age, occupation, and location of the author. Although we conducted the experiments on different types of features, including style-based and content-based features, we focus more on analysis the effects of content-based features. We used machine learning approaches to perform classification tasks on datasets we collected from popular forums in Vietnamese. The results show that these kinds of features work well on such a kind of short and free style messages as forum posts, in which, content-based features achieved much better results than style-based features.