TY - UNPB
T1 - Safe reinforcement learning for multi-energy management systems with known constraint functions
AU - Ceusters, Glenn
AU - Ramirez Camargo, Luis
AU - Franke, Rüdiger
AU - Nowé, Ann
AU - Messagie, Maarten
N1 - 25 pages, 12 figures
PY - 2022/7/8
Y1 - 2022/7/8
N2 - Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation and which provides hard-constraint satisfaction guarantees both during training (exploration) and exploitation of the (close-to) optimal policy. In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark (94,6% and 82,8% compared to 35,5%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques capable beyond RL, as demonstrated with random agents while still providing hard-constraint guarantees. Finally, we propose fundamental future work to i.a. improve the constraint functions itself as more data becomes available.
AB - Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation and which provides hard-constraint satisfaction guarantees both during training (exploration) and exploitation of the (close-to) optimal policy. In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark (94,6% and 82,8% compared to 35,5%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques capable beyond RL, as demonstrated with random agents while still providing hard-constraint guarantees. Finally, we propose fundamental future work to i.a. improve the constraint functions itself as more data becomes available.
KW - eess.SY
KW - cs.AI
KW - cs.LG
KW - cs.SY
KW - math.OC
U2 - 10.48550/arXiv.2207.03830
DO - 10.48550/arXiv.2207.03830
M3 - Preprint
SP - 1
EP - 25
BT - Safe reinforcement learning for multi-energy management systems with known constraint functions
PB - arXiv
ER -