A parallel hybrid approach integrating clonal selection with artificial bee colony for logistic regression in spam email detection

DEDETÜRK, BİLGE; AKAY, BAHRİYE

doi:10.1007/s00521-024-10505-7

A parallel hybrid approach integrating clonal selection with artificial bee colony for logistic regression in spam email detection

Atıf İçin Kopyala

DEDETÜRK B. K., AKAY B.

Neural Computing and Applications, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2024
Doi Numarası: 10.1007/s00521-024-10505-7
Dergi Adı: Neural Computing and Applications
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
Anahtar Kelimeler: Artificial bee colony, Clonal selection algorithm, Email spam detection, Logistic regression, Spam filtering
Erciyes Üniversitesi Adresli: Evet

Özet

Spam emails are sent to recipients for advertisement and phishing purposes. In either case, it disturbs recipients and reduces communication quality. Addressing this issue requires classifying emails on servers as either spam or ham. Numerous methods have been proposed for this classification task. Among them, logistic regression (LR) stands out for its simplicity, speed, and ease of implementation. However, LR suffers from low detection rates caused by the gradient descent algorithm used in its training phase. To overcome this limitation, we propose a novel method based on the clonal selection algorithm (CSA), renowned for its success in optimization problems due to its local and global search capabilities. Despite CSA’s effective optimization performance, it suffers from robustness and slow training time. Therefore, the CSA and artificial bee colony (ABC) algorithms are hybridized to improve CSA’s robustness and are parallelized to reduce the training time significantly. This hybrid method is employed to optimize the weights of LR by minimizing the cost at the output of LR. The empirical results denote that the proposed method, named CSA–ABC–LR, yields better classification performance compared to state-of-the-art models reported by previous studies, demonstrating an accuracy rate of 99.13% on the Enron-1 dataset, 99.22% on the CSDMC2010 dataset, and 94.49% on the Spambase dataset.