A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning

Yangyang Zhao, Hua Qin, Wang Zhenyu, Changxi Zhu, Shihan Wang

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Training a deep reinforcement learning-based dialogue policy with brute-force random sampling is costly. A new training paradigm was proposed to improve learning performance and efficiency by combining curriculum learning. However, attempts in the field of dialogue policy are very limited due to the lack of reliable evaluation of difficulty scores of dialogue tasks and the high sensitivity to the mode of progression through dialogue tasks. In this paper, we present a novel versatile adaptive curriculum learning (VACL) framework, which presents a substantial step toward applying automatic curriculum learning on dialogue policy tasks. It supports evaluating the difficulty of dialogue tasks only using the learning experiences of dialogue policy and skip-level selection according to their learning needs to maximize the learning efficiency. Moreover, an attractive feature of VACL is the construction of a generic, elastic global curriculum while training a good dialogue policy that could guide different dialogue policy learning without extra effort on re-training. The superiority and versatility of VACL are validated on three public dialogue datasets.
    Original languageEnglish
    Title of host publicationFindings of the Association for Computational Linguistics: NAACL 2022
    PublisherAssociation for Computational Linguistics
    Pages711-723
    Number of pages13
    ISBN (Electronic)9781955917766
    DOIs
    Publication statusPublished - 1 Jul 2022

    Bibliographical note

    Funding Information:
    We would like to thank the reviewers for their comments and efforts towards improving our paper. And we would like to acknowledge volunteers of the South China University of Technology who help us with the human experiments. This work was supported by the Key-Area Research and Development Program of Guangdong Province, China (Grant No.2019B0101540042) and the Natural Science Foundation of Guangdong Province, China (Grant No.2019A1515011792).

    Publisher Copyright:
    © Findings of the Association for Computational Linguistics: NAACL 2022 - Findings.

    Fingerprint

    Dive into the research topics of 'A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning'. Together they form a unique fingerprint.

    Cite this