Adaptive Weighting by Sinkhorn Distance for Sharing Experiences between Multi-Task Reinforcement Learning in Sparse-reward Environments
Main Article Content
Abstract
Abstract: In multi-task reinforcement learning, an agent using off-policy learning can leverage samples from other tasks to improve its learning process. When the reward signal from the environment is
sparse, the agent in each task spends most of its training time exploring the environment. Therefore,
the shared experiences between tasks can generally be considered as samples derived from an exploration policy. However, when the exploitation phase begins, the shared experience framework must
account for the divergence of policies across different learning tasks. However, when the exploitation phase starts, the sharing experience framework has to take into account the policies’ divergence
issue of different learning tasks. Our work addresses this issue by employing an adaptive weight for
shared experiences. First, a central buffer collects and shares the experiences from each individual
task. To mitigate the effects of policy divergence among multiple tasks, we propose an algorithm that
measures policy distances using the Sinkhorn distance. The computed distances are used to assign
a specific weight to each shared sample, controlling the amount of knowledge shared as the policies
begin to diverge during the exploitation phase. We conduct experiments in two goal-based multitask learning environments to evaluate the effectiveness of our approach. The results show that our
proposed method can improve from 8%-10% in average rewards in comparison with other baselines.
Keywords: multi-task reinforcement learning, off-policy, experience sharing.