Sensors, cilt.26, sa.5, 2026 (SCI-Expanded, Scopus)
Industrial robots are widely used in critical tasks such as assembly, welding, and material handling as core components of modern manufacturing systems. For the reliable operation of these systems, early and accurate detection of execution failures is crucial. In this study, a comprehensive comparison of machine learning and deep learning methods is conducted for the classification of robot execution failures using data acquired from force–torque sensors. Three different feature engineering approaches are proposed. The first is a Baseline approach that includes 90 raw time-series features. The second is the Domain-6 approach, which consists of 6 basic statistical features per sensor (36 in total). The third is the Domain-12 approach, which comprises 12 comprehensive statistical features per sensor (72 in total). The domain features include the mean, standard deviation, minimum, maximum, range, slope, median, skewness, kurtosis, RMS, energy, and IQR. In total, ten classification algorithms are evaluated, including eight machine learning methods and two deep learning models: Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (KNN), Artificial Neural Network (ANN), Naive Bayes (NB), Decision Trees (DT), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM-LGBM), as well as a One-Dimensional Convolutional Neural Network (CNN-1D) and Long Short-Term Memory (LSTM). For traditional machine learning algorithms, 5 × 5 nested cross-validation is used, whereas for deep learning models, 5-fold cross-validation with a 20% validation split is employed. To ensure statistical reliability, all experiments are repeated over 30 independent runs. The experimental results demonstrate that feature engineering has a decisive impact on classification performance. In addition, regardless of the feature set, the highest accuracy (93.85% ± 0.90) is achieved by the Naive Bayes classifier using the Baseline features. The Domain-12 feature set provides consistent improvements across many algorithms, with substantial performance gains. The results are reported using accuracy, precision, recall, and F1-score metrics and are supported by confusion matrices. Finally, permutation feature importance analysis indicates that the skewness features of the Fx and Fy sensors are the most critical variables for failure detection. Overall, these findings show that time-domain statistical features offer an effective approach for robot failure classification.