APPLIED SOFT COMPUTING, vol.129, 2022 (SCI-Expanded)
In this study, it was investigated that how machine learning (ML) methods show performance in different problems having different characteristics. Six ML approaches including Artificial neural networks (ANN), gaussian process regression (GPR), support vector machine regression (SVMR), long short-term memory (LSTM), multi-gene genetic programming (MGGP) and M5 model tree (M5Tree) were utilized to analyze three independent civil engineering problems belonging to construction management, geotechnical engineering, and hydrological engineering sub-disciplines. Mean absolute percentage error (MAPE), root mean square error (RMSE), coefficient of determination (R2), relative root means square error (RRMSE), Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), and overall index of model performance (OI) criteria were used to evaluate the performances of the models. Besides performance criteria, the relative performances of the six ML models were assessed using Taylor diagram, Violin diagram and One-Tailed Wilcoxon Signed-Rank Test. For each of the problem considered in this study, the effectiveness of the input parameters on the output parameter has been defined using the Relief Method and Correlation Coefficient. The results show that ANN and MGGP models yielded the most successful estimations for three different problems considered. The best prediction was achieved by MGGP model for hydrological engineering problem. For the construction management, geotechnical engineering problems, the best results were obtained using the ANN model. All models were reliable to solve the geotechnical engineering and hydrological engineering problems while LSTM and SVMR models are not reliable to solve the construction management problem. The most and least effective input parameters on output parameter were contract cost (CC) and work definition number (WDN) for the managerial data set. On the other hand, the most and least effective input parameters on the output parameters for the experimental and natural data sets have been obtained as width of the pile (B), rotation degree (R) and minimum temperature (Tmin), streamflow (Q) data, respectively. The number of data and data selection have a significant effect on the homogeneity of the data set and its representativeness of the problem. The error values obtained in test stage are affected from this condition. The equations to calculate the outputs of each of the problem considered were obtained using MGGP and M5Tree models. (c) 2022 Elsevier B.V. All rights reserved.