Proceedings of the
The 33rd European Safety and Reliability Conference (ESREL 2023)
3 – 8 September 2023, Southampton, UK
A Generic Fully Unsupervised Framework for Machine-Learning-Based Anomaly Detection
Zurich University of Applied Sciences, Technikumstrasse 81, Winterthur, Switzerland.
ABSTRACT
One of the main challenges of applying machine learning algorithms for industrial fault detection is the scarcity of annotated data, especially from faulty or degraded regimes. Commonly used approaches resort to residual-based anomaly detection (AD), thereby training machine learning models with normal, anomaly-free data exclusively, and detecting deviations from normal behavior during deployment1,2,3,4. However, in real-world industrial and operational systems, it is often the case that the training data is completely unlabeled, and may contain anomalies. Thus, training residual-based AD models with unlabeled, potentially contaminated data may result in reduced AD performance.
In this work we present a novel approach to the refinement of contaminated training data in an entirely unsupervised manner, enabling high performance AD despite the data contamination. The proposed framework is generic and can be applied to any residual-based model, whether reconstruction-based (such as Principal Component Analysis or Autoencoder neural networks), or regression-based (from linear regression to deep neural networks). We demonstrate the application of the framework to two public data sets of time series data: acoustic signals from industrial machines, and aircraft engine data. The two examples differ in their physical systems as well as in their fault dynamics (sudden failures vs. slow degradation). We show the superiority of the framework over the naive approach of training blindly with contaminated data. In addition, we compare its performance to the ideal reference case of AD with anomaly-free training data. We show that the proposed framework is similar and sometimes outperforms this ideal baseline.
Keywords: Fully-unsupervised learning, Anomaly detection, Contaminated data, Prognostics and health management, Machine learning, Deep learning, Acoustic sensor data, Aircraft engines.