TY - JOUR
T1 - Using Ensemble Learning for Anomaly Detection in Cyber–Physical Systems
AU - Jeffrey, Nicholas
AU - Tan, Qing
AU - Villar, José R.
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/4
Y1 - 2024/4
N2 - The swift embrace of Industry 4.0 paradigms has led to the growing convergence of Information Technology (IT) networks and Operational Technology (OT) networks. Traditionally isolated on air-gapped and fully trusted networks, OT networks are now becoming more interconnected with IT networks due to the advancement and applications of IoT. This expanded attack surface has led to vulnerabilities in Cyber–Physical Systems (CPSs), resulting in increasingly frequent compromises with substantial economic and life safety repercussions. The existing methods for the anomaly detection of security threats typically use simple threshold-based strategies or apply Machine Learning (ML) algorithms to historical data for the prediction of future anomalies. However, due to the high levels of heterogeneity across different CPS environments, minimizing the opportunities for transfer learning, and the scarcity of real-world data for training, the existing ML-based anomaly detection techniques suffer from a poor predictive performance. This paper introduces a hybrid anomaly detection approach designed to identify threats to CPSs by combining the signature-based anomaly detection typically utilized in IT networks, the threshold-based anomaly detection typically utilized in OT networks, and behavioural-based anomaly detection using Ensemble Learning (EL), which leverages the strengths of multiple ML algorithms against the same dataset to increase the accuracy. Multiple public research datasets were used to validate the proposed approach, with the hybrid methodology employing a divide-and-conquer strategy to offload the detection of certain cyber threats to computationally inexpensive signature-based and threshold-based methods using domain knowledge to minimize the size of the behavioural-based data needed for ML model training, thus achieving a higher accuracy over a reduced timeframe. The experimental results showed accuracy improvements of 4–7% over those of the conventional ML classifiers in performing anomaly detection across multiple datasets, which is particularly important to the operators of CPS environments due to the high financial and life safety costs associated with interruptions to system availability.
AB - The swift embrace of Industry 4.0 paradigms has led to the growing convergence of Information Technology (IT) networks and Operational Technology (OT) networks. Traditionally isolated on air-gapped and fully trusted networks, OT networks are now becoming more interconnected with IT networks due to the advancement and applications of IoT. This expanded attack surface has led to vulnerabilities in Cyber–Physical Systems (CPSs), resulting in increasingly frequent compromises with substantial economic and life safety repercussions. The existing methods for the anomaly detection of security threats typically use simple threshold-based strategies or apply Machine Learning (ML) algorithms to historical data for the prediction of future anomalies. However, due to the high levels of heterogeneity across different CPS environments, minimizing the opportunities for transfer learning, and the scarcity of real-world data for training, the existing ML-based anomaly detection techniques suffer from a poor predictive performance. This paper introduces a hybrid anomaly detection approach designed to identify threats to CPSs by combining the signature-based anomaly detection typically utilized in IT networks, the threshold-based anomaly detection typically utilized in OT networks, and behavioural-based anomaly detection using Ensemble Learning (EL), which leverages the strengths of multiple ML algorithms against the same dataset to increase the accuracy. Multiple public research datasets were used to validate the proposed approach, with the hybrid methodology employing a divide-and-conquer strategy to offload the detection of certain cyber threats to computationally inexpensive signature-based and threshold-based methods using domain knowledge to minimize the size of the behavioural-based data needed for ML model training, thus achieving a higher accuracy over a reduced timeframe. The experimental results showed accuracy improvements of 4–7% over those of the conventional ML classifiers in performing anomaly detection across multiple datasets, which is particularly important to the operators of CPS environments due to the high financial and life safety costs associated with interruptions to system availability.
KW - anomaly detection
KW - Cyber–Physical Systems
KW - ensemble learning
KW - IIoT
KW - IoT
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85190102387&partnerID=8YFLogxK
U2 - 10.3390/electronics13071391
DO - 10.3390/electronics13071391
M3 - Journal Article
AN - SCOPUS:85190102387
VL - 13
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 7
M1 - 1391
ER -