A linear multivariate decision tree with branch-and-bound components


Engür E., SOYLU B.

Neurocomputing, cilt.576, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 576
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.neucom.2024.127354
  • Dergi Adı: Neurocomputing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Anahtar Kelimeler: Branch-and-bound, Classification, Fathoming rule, Machine learning, Node selection, Optimization
  • Erciyes Üniversitesi Adresli: Evet

Özet

This study presents a new linear multivariate decision tree (MDT) algorithm that incorporates linear programming and components of the branch-and-bound methodology, such as bound-based pruning and node selection strategies. MDTs are known to better separate classes than univariate counterparts. However, the methodology used to construct splitting hyperplanes significantly affects the tree's performance. To determine the splitting hyperplane at each node, we efficiently solve a deviation model, a linear program (LP) that minimizes the external deviation of misclassified alternatives from the splitting hyperplane. The proposed MDT, called the MDT-DevM, uses upper and lower bounds for pruning. The deviation at a node provides an upper bound for its child. The lower bound is derived from the leaf nodes. The proposed pruning rule much contributes to the induction of a shallower and more accurate decision tree. Several node selection strategies (depth_first, breadth_first and best_first) are tried in the proposed MDT. We have also presented a variant of MDT, called the MDT-SVM, which utilizes linear SVM to find the splitting hyperplane. We performed comprehensive experiments on artificial and real datasets with different levels of complexity. We compared the proposed methods with benchmark classifiers. The proposed MDTs outperform the linear classifiers. The MDT-DevM shows better performance as the data becomes more dispersed and imbalanced.