Author Profiling of Vietnamese Forum Posts - An Investigation on Content-based Features
Main Article Content
Abstract
In this paper, we investigate the author profiling task for Vietnamese forum posts to predict demographic attributes, such as gender, age, occupation, and location of the author. Although we conducted the experiments on different types of features, including style-based and content-based features, we focus more on analysis the effects of content-based features. We used machine learning approaches to perform classification tasks on datasets we collected from popular forums in Vietnamese. The results show that these kinds of features work well on such a kind of short and free style messages as forum posts, in which, content-based features achieved much better results than style-based features.