Applied Sciences (Switzerland), cilt.16, sa.3, 2026 (SCI-Expanded, Scopus)
In the fight against global climate change, the transportation sector is of critical importance because it is one of the major causes of total greenhouse gas emissions worldwide. Although urban rail transit systems offer a lower carbon footprint compared to road transportation, accurately forecasting the energy consumption of these systems is vital for sustainable urban planning, energy supply management, and the development of carbon balancing strategies. In this study, forecasting models are designed using five different machine learning (ML) algorithms, and their performances in predicting the energy consumption and carbon footprint of urban rail transit systems are comprehensively compared. For five distribution-center substations, 10 years of monthly energy consumption data and the total carbon footprint data of these substations are used. Support Vector Regression (SVR), Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Nonlinear Autoregressive Neural Network (NAR-NN) models are developed to forecast these data. Model hyperparameters are optimized using a 20-iteration Random Search algorithm, and the stochastic models are run 10 times with the optimized parameters. Results reveal that the SVR model consistently exhibits the highest forecasting performance across all datasets. For carbon footprint forecasting, the SVR model yields the best results, with an (Formula presented.) of 0.942 and a MAPE of 3.51%. The ensemble method XGBoost also demonstrates the second-best performance ((Formula presented.)). Accordingly, while deterministic traditional ML models exhibit superior performance, the neural network-based stochastic models, such as LSTM, ANFIS, and NAR-NN, show insufficient generalization capability under limited data conditions. These findings indicate that, in small- and medium-scale time-series forecasting problems, traditional machine learning methods are more effective than neural network-based methods that require large datasets.