Learning Reward Structure with Subtasks in Reinforcement Learning

Shuai Han*, Mehdi Dastani*, Shihan Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Improving sample efficiency of Reinforcement Learning (RL) in sparse-reward environments poses a significant challenge. In scenarios where the reward structure is complex, accurate action evaluation often relies heavily on precise information about past achieved subtasks and their order. Previous approaches have often failed or proved inefficient in constructing and leveraging such intricate reward structures. In this work, we propose an RL algorithm that can automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. Given such minimal knowledge about the task, we train a high-level policy that selects optimal subtasks in each state together with a low-level policy that efficiently learns to complete each sub-task. We evaluate our algorithm in a variety of sparse-reward environments. The experiment results show that our method significantly outperforms the state-of-art baselines as the difficulty of the task increases.

Original languageEnglish
Title of host publicationECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
EditorsUlle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarin-Diz, Jose M. Alonso-Moral, Senen Barro, Fredrik Heintz
PublisherIOS Press
Pages2282-2289
Number of pages8
ISBN (Electronic)9781643685489
DOIs
Publication statusPublished - 16 Oct 2024
Event27th European Conference on Artificial Intelligence, ECAI 2024 - Santiago de Compostela, Spain
Duration: 19 Oct 202424 Oct 2024

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume392
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference27th European Conference on Artificial Intelligence, ECAI 2024
Country/TerritorySpain
CitySantiago de Compostela
Period19/10/2424/10/24

Bibliographical note

Publisher Copyright:
© 2024 The Authors.

Fingerprint

Dive into the research topics of 'Learning Reward Structure with Subtasks in Reinforcement Learning'. Together they form a unique fingerprint.

Cite this