Huynh Tra My Nguyen, Viet Cuong TA

Main Article Content

Abstract

Due to the increasing amount of unlabeled data, a more flexible approach is required to label data efficiently. The aim of active learning is to identify which data samples are the most valuable for learning with the dataset, thus achieving better performance with much fewer samples. Recent works show that although the data augmentation strategies are simple, they have the potential to improve active learning by expanding the input space’s exploration and assisting in the discovery of
more informative samples. By effectively controlling a set of augment operators on each active learning cycle, one could choose promising candidates from the set of unlabeled data for each iteration step of active learning. However, the scoring model is built on a hard reset at each data acquisition cycle, which is time-consuming and missing important information from previous cycles. To address the issues, we propose an incremental training procedure for active learning that avoids retraining the scoring model at each updating cycle. By relying on an augmentation strategy, the model can be used to derive a new score based on the combination between the lowest confidence score with its variance in previous cycles. Thus, the resulting scores give a better approximation of the uncertainty of the samples We evaluate our proposed algorithms on two popular benchmarks, FASHION-MNIST and CIFAR-10, and the results highlight that our method can improve the accuracy from 2% to 4% in comparison with the other baselines.