Abstract
In deep reinforcement learning, experience replay is usually used to improve data efficiency and alleviate experience forgetting. However, online reinforcement learning is often influenced by the index of experience, which usually makes the phenomenon of unbalanced sampling. In addition, most experience replay methods ignore the differences among experiences, and cannot make full use of all experiences. Especially many “near”-policy experiences relatively relevant to the current policy are wasted, despite of the fact that they are beneficial for improving sample efficiency. This paper theoretically analyzes the influence of various factors on experience sampling, and then proposes a sampling method for experience replay based on frequency and similarity (FSER) to alleviate unbalanced sampling and increase the value of the sampled experiences. FSER prefers experiences that are rarely sampled or highly relevant to the current policy. FSER plays a critical role to balance the experience forgetting and wasting problems. Finally, FSER is combined with TD3 to achieve the state-of-the-art results in multiple tasks.
Original language | English |
---|---|
Article number | 124017 |
Number of pages | 10 |
Journal | Expert Systems with Applications |
Volume | 251 |
Early online date | 21 Apr 2024 |
DOIs | |
Publication status | Published - 1 Oct 2024 |
Bibliographical note
Publisher Copyright:© 2024 Elsevier Ltd
Funding
We sincerely thank the anonymous reviewers for their careful work and thoughtful suggestions,which have greatly improved this article. This work was supported by the Natural Science Research Foundation of Jilin Province of China under Grant Nos. 20220101106JC and YDZJ202201ZYTS423 , the National Natural Science Foundation of China under Grant No. 61300049 , the Fundamental Research Funds for the Central Universities (Jilin University) under Grant No. 93K172022K10 , the Fundamental Research Funds for the Central Universities (Northeast Normal University) under Grant No. 2412022QD040 , and the National Key R&D Program of China under Grant No. 2017YFB1003103 . We sincerely thank the anonymous reviewers for their careful work and thoughtful suggestions, which have greatly improved this article. This work was supported by the Natural Science Research Foundation of Jilin Province of China under Grant Nos. 20220101106JC and YDZJ202201ZYTS423, and the Fundamental Research Funds for the Central Universities of China under Grant Nos. 2412022ZD018, 2412022QD040 and 93K172022K10.
Funders | Funder number |
---|---|
National Natural Science Foundation of China | 61300049 |
Fundamental Research Funds for the Central Universities | 2412022ZD018 |
Natural Science Foundation of Jilin Province | YDZJ202201ZYTS423, 20220101106JC |
National Key Research and Development Program of China | 2017YFB1003103 |
Jilin University | 93K172022K10 |
Northeast Normal University | 2412022QD040 |
Keywords
- Experience replay
- Experience sampling
- Exploitation
- Off-policy learning
- Reinforcement learning