Semantic-Aware Action Space Compression via LLM-DRL Synergy for Efficient Task-oriented Dialogue Policy Exploration

  • Yangyang Zhao
  • , Ben Niu
  • , Yuxuan Tan
  • , Shihan Wang
  • , Libo Qin*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

The flexibility of natural language significantly expands the action space in task-oriented dialogue systems, causing inefficient exploration and slow convergence in deep reinforcement learning (DRL)-based policy optimization. Pretrained large language models (LLMs), with world knowledge and semantic understanding, offer promising solutions. To this end, we propose LLM-Guided DRL via Semantic-Aware Action Pruning (LLMSAP), a novel framework that synergizes pretrained LLMs with DRL. LLMSAP leverages the world knowledge and contextual understanding of LLMs to guide decision-making via an action feasibility assessment. Instead of requiring LLMs to directly generate optimal actions due to their limited precision in sequential decision tasks, LLMSAP employs a lightweight action pruning mechanism. Specifically, LLMs act as action filters, rapidly eliminating semantically implausible or low-potential actions from multi-turn dialogue context, allowing the DRL agent to focus exploration on a refined candidate subset. This two-stage framework ("prune-then-optimize") avoids extensive LLM fine-tuning while preserving the decision-making precision of DRL. Experiments on multiple benchmarks verify the effectiveness of LLMSAP.

Original languageEnglish
Title of host publicationEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
PublisherAssociation for Computational Linguistics (ACL)
Pages17808-17820
Number of pages13
ISBN (Electronic)9798891763357
DOIs
Publication statusPublished - Nov 2025
Event30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, China
Duration: 4 Nov 20259 Nov 2025

Publication series

NameEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025

Conference

Conference30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25

Bibliographical note

Publisher Copyright:
©2025 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Semantic-Aware Action Space Compression via LLM-DRL Synergy for Efficient Task-oriented Dialogue Policy Exploration'. Together they form a unique fingerprint.

Cite this