Mixed experience sampling for off-policy reinforcement learning

Jiayu Yu, Jingyao Li, Shuai Lü*, Shuai Han

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

In deep reinforcement learning, experience replay is usually used to improve data efficiency and alleviate experience forgetting. However, online reinforcement learning is often influenced by the index of experience, which usually makes the phenomenon of unbalanced sampling. In addition, most experience replay methods ignore the differences among experiences, and cannot make full use of all experiences. Especially many “near”-policy experiences relatively relevant to the current policy are wasted, despite of the fact that they are beneficial for improving sample efficiency. This paper theoretically analyzes the influence of various factors on experience sampling, and then proposes a sampling method for experience replay based on frequency and similarity (FSER) to alleviate unbalanced sampling and increase the value of the sampled experiences. FSER prefers experiences that are rarely sampled or highly relevant to the current policy. FSER plays a critical role to balance the experience forgetting and wasting problems. Finally, FSER is combined with TD3 to achieve the state-of-the-art results in multiple tasks.

Original languageEnglish
Article number124017
Number of pages10
JournalExpert Systems with Applications
Volume251
Early online date21 Apr 2024
DOIs
Publication statusPublished - 1 Oct 2024

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Funding

We sincerely thank the anonymous reviewers for their careful work and thoughtful suggestions,which have greatly improved this article. This work was supported by the Natural Science Research Foundation of Jilin Province of China under Grant Nos. 20220101106JC and YDZJ202201ZYTS423 , the National Natural Science Foundation of China under Grant No. 61300049 , the Fundamental Research Funds for the Central Universities (Jilin University) under Grant No. 93K172022K10 , the Fundamental Research Funds for the Central Universities (Northeast Normal University) under Grant No. 2412022QD040 , and the National Key R&D Program of China under Grant No. 2017YFB1003103 . We sincerely thank the anonymous reviewers for their careful work and thoughtful suggestions, which have greatly improved this article. This work was supported by the Natural Science Research Foundation of Jilin Province of China under Grant Nos. 20220101106JC and YDZJ202201ZYTS423, and the Fundamental Research Funds for the Central Universities of China under Grant Nos. 2412022ZD018, 2412022QD040 and 93K172022K10.

FundersFunder number
National Natural Science Foundation of China61300049
Fundamental Research Funds for the Central Universities2412022ZD018
Natural Science Foundation of Jilin ProvinceYDZJ202201ZYTS423, 20220101106JC
National Key Research and Development Program of China2017YFB1003103
Jilin University93K172022K10
Northeast Normal University2412022QD040

    Keywords

    • Experience replay
    • Experience sampling
    • Exploitation
    • Off-policy learning
    • Reinforcement learning

    Fingerprint

    Dive into the research topics of 'Mixed experience sampling for off-policy reinforcement learning'. Together they form a unique fingerprint.

    Cite this