Improving sample efficiency of reinforcement learning: Exploiting structural knowledge for decision making

  • Shuai Han

Research output: ThesisDoctoral thesis 1 (Research UU / Graduation UU)

Abstract

Reinforcement learning (RL) has achieved remarkable progress in recent years, yet its application in real-world tasks is hindered by poor sample efficiency, especially in structurally complex environments. This thesis investigates how structural knowledge, including subtask composition, symbolic reasoning, communication structure and agent influence can be exploited to improve the efficiency of single-agent and multi-agent RL algorithms. First, we introduce a hierarchical RL framework that automatically structures subtasks. By jointly learning high-level subtask selection and low-level subtask execution, the method achieves superior performance in sparse-reward environments. Second, we propose a neuro-symbolic RL framework that integrates probabilistic symbolic reasoning with policy learning. By introducing a probabilistic inference modular to calculate action precondition masks, the framework excludes infeasible actions via symbolic knowledge, thereby improving both sample efficiency and policy safety. Third, we present a multi-agent RL framework that exploits communication structure through decentralized scheduling of sparse communication. Agents learn when to share local messages by predicting others’ messages, leading to improved performance with reduced communication overhead. Finally, we design a multi-agent RL framework, which automatically identifies the state dimensions controllable by each agent. This structural insight enables focused exploration and precise credit assignment in cooperative multi-agent scenarios with sparse rewards. Together, these contributions advance the sample efficiency of RL by systematically exploiting structural knowledge in decision-making processes. The results across diverse domains demonstrate that the proposed methods outperform state-of-the-art baselines.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Utrecht University
Supervisors/Advisors
  • Dastani, Mehdi, Supervisor
  • Wang, Shihan, Co-supervisor
Award date24 Mar 2026
Place of PublicationUtrecht
Publisher
DOIs
Publication statusPublished - 24 Mar 2026

Keywords

  • Reinforcement learning
  • Multi-agent reinforcement learning
  • sample efficiency
  • subtask composition
  • neuro-symbolic learning
  • action mask
  • communication
  • exploration
  • credit assignment

Fingerprint

Dive into the research topics of 'Improving sample efficiency of reinforcement learning: Exploiting structural knowledge for decision making'. Together they form a unique fingerprint.

Cite this