Proceedings of the
35th European Safety and Reliability Conference (ESREL2025) and
the 33rd Society for Risk Analysis Europe Conference (SRA-E 2025)
15 – 19 June 2025, Stavanger, Norway
Offline Learning of Maintenance Policies Using Reinforcement Learning and Historical Maintenance Data
Computer Science and Digital Society Laboratory (LIST3N), University of Technology of Troyes, France.
ABSTRACT
In condition-based maintenance optimization, it is often assumed that the degradation process model is known, so that classical paradigms, such as (Markov)-renewal theory or dynamic programming, can be adopted to find this optimal policy. When degradation modeling becomes challenging, it is possible to learn such a policy directly from maintenance data. Considering offline datasets, consisting of pre- and post-maintenance system states, actions taken and associated costs, generated by various non-optimal behavior policies, our goal is to explore reinforcement learning approach to extract better maintenance policies, without any further system condition monitoring information. In the literature, offline reinforcement learning methods have been studied for maintenance optimization with discounted reward metric and discrete degradation state space, but still received less attention when considering continuous state space in the infinite horizon under the average reward metric. In this paper, we adapted a relative Q-learning algorithm with function approximation to offline settings under the average reward metric and combined it with data augmentation to learn higher performance policies from several maintenance datasets collected from continuously degrading maintained systems. Numerous results under different data configurations show that a near-optimal policy can be learned with relatively little data.
Keywords: Condition-based maintenance, Continuous state space, Average reward, Markov decision process, Reinforcement learning, Relative Q-learning, Function approximation, Data augmentation, Maintenance data.