Abstract
Reinforcement Learning (RL) shows promise in optimizing task-oriented dialogue policies, but addressing the challenge of reward sparsity remains challenging. Curriculum learning offers an effective solution by strategically training dialogue policies from simple to complex, facilitating a smooth knowledge transition across varied goal complexities. However, these methods typically assume that goal difficulty will increase gradually to adapt to difficult goals over time. In complex environments lacking intermediate goals, attaining smooth knowledge transitions becomes tricky. This paper proposes a novel Bootstrapped Policy Learning (BPL) framework that adaptively tailors a curriculum for each complex goal through goal shaping, which consists of progressively challenging subgoals. Goal shaping comprises goal decomposition and evolution, breaking complex goals into solvable subgoals and progressively increasing subgoal difficulty as the policy improves. BPL harmoniously combines these aspects, enabling smooth knowledge transitions from simple to complex goals, thereby enhancing task-oriented dialogue policy learning efficiency. Our experiments demonstrate the effectiveness of BPL in two complex dialogue environments.
Original language | English |
---|---|
Pages (from-to) | 2615-2617 |
Number of pages | 3 |
Journal | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
Volume | 2024 |
Issue number | May |
Publication status | Published - 6 May 2024 |
Event | 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand Duration: 6 May 2024 → 10 May 2024 |
Keywords
- Curriculum Learning
- Dialogue Policy
- Goal Shaping
- Reinforcement Learning