Multi-objective reinforcement learning for provably incentivising alignment with value systems

  • Manel Rodriguez-Soto*
  • , Roxana Rădulescu
  • , Filippo Bistaffa
  • , Oriol Ricart
  • , Arnau Mayoral-Macau
  • , Maite Lopez-Sanchez
  • , Juan A. Rodriguez-Aguilar
  • , Ann Nowé
  • *Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

This paper addresses the problem of ensuring that autonomous learning agents align with multiple moral values. Specifically, we present the theoretical principles and algorithmic tools necessary for creating an environment where we ensure that the agent learns a behaviour aligned with multiple moral values while striving to achieve its individual objective. To address this value alignment problem, we adopt the Multi-Objective Reinforcement Learning framework and propose a novel algorithm that combines techniques from Multi-Objective Reinforcement Learning and Linear Programming. In addition, we illustrate our value alignment process with an example involving an autonomous vehicle. Here, we demonstrate that the agent learns to behave in alignment with the ethical values of safety, achievement, and comfort, with achievement representing the agent's individual objective. Such ethical behaviour differs depending on the ordering between values. We also use a synthetic multi-objective environment to evaluate the computational costs of guaranteeing ethical learning as the number of values increases.

Original languageEnglish
Article number104460
Number of pages26
JournalArtificial Intelligence
Volume351
DOIs
Publication statusPublished - Feb 2026

Bibliographical note

Publisher Copyright:
© 2025 The Author(s)

Keywords

  • Ethics
  • Multi-objective reinforcement learning
  • Value alignment

Fingerprint

Dive into the research topics of 'Multi-objective reinforcement learning for provably incentivising alignment with value systems'. Together they form a unique fingerprint.

Cite this