TY - JOUR
T1 - Safe reinforcement learning for multi-energy management systems with known constraint functions
AU - Ceusters, Glenn
AU - Camargo, Luis Ramirez
AU - Franke, Rüdiger
AU - Nowé, Ann
AU - Messagie, Maarten
N1 - Funding Information:
This work has been supported in part by ABB n.v., Belgium and Flemish Agency for Innovation and Entrepreneurship (VLAIO) grant HBC.2019.2613 , Belgium.
Publisher Copyright:
© 2022 The Author(s)
PY - 2023/4
Y1 - 2023/4
N2 - Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees — resulting in various potentially unsafe interactions within its environment. In this paper, we present two novel online model-free safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation. These provide hard-constraint satisfaction guarantees both during training and deployment of the (near) optimal policy. This is without the need of solving a mathematical program, resulting in less computational power requirements and more flexible constraint function formulations. In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility compared to a vanilla RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and 77,8%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees.
AB - Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees — resulting in various potentially unsafe interactions within its environment. In this paper, we present two novel online model-free safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation. These provide hard-constraint satisfaction guarantees both during training and deployment of the (near) optimal policy. This is without the need of solving a mathematical program, resulting in less computational power requirements and more flexible constraint function formulations. In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility compared to a vanilla RL benchmark and Optlayer benchmark (94,6% and 82,8% compared to 35,5% and 77,8%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees.
KW - Constraints
KW - Energy management system
KW - Multi-energy systems
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85145781656&partnerID=8YFLogxK
U2 - 10.1016/j.egyai.2022.100227
DO - 10.1016/j.egyai.2022.100227
M3 - Article
AN - SCOPUS:85145781656
SN - 2666-5468
VL - 12
SP - 1
EP - 17
JO - Energy and AI
JF - Energy and AI
M1 - 100227
ER -