Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network


Kilicarslan S., Adem K., ÇELİK M.

MEDICAL HYPOTHESES, cilt.137, 2020 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 137
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1016/j.mehy.2020.109577
  • Dergi Adı: MEDICAL HYPOTHESES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, CAB Abstracts, CINAHL, EMBASE, MEDLINE, Veterinary Science Database
  • Anahtar Kelimeler: Microarray, Machine learning, Deep learning, CNN, SAE, ReliefF, SUPPORT VECTOR MACHINE, GENE SELECTION, DEEP
  • Erciyes Üniversitesi Adresli: Evet

Özet

Machine learning and deep learning methods aims to discover patterns out of datasets such as, microarray data and medical data. In recent years, the importance of producing microarray data from tissue and cell samples and analyzing these microarray data has increased. Machine learning and deep learning methods have been started to use in the diagnosis and classification of microarray data of cancer diseases. However, it is challenging to analyze microarray data due to the small number of sample size and high number of features of microarray data and in some cases some features may not be relevant with the classification. Because of this reason, studies in the literature focused on developing feature selection/dimension reduction techniques and classification algorithms to improve classification accuracy of the microarray data. This study proposes hybrid methods by using Relief and stacked autoencoder approaches for dimension reduction and support vector machines (SVM) and convolutional neural networks (CNN) for classification. In the study, three microarray datasets of Overian, Leukemia and Central Nervous System (CNS) were used. Ovarian dataset contains 253 samples, 15,154 genes and 2 classes, Leukemia dataset contains 72 samples, 7129 genes, and 2 classes and CNS dataset contains 60 samples, 7129 genes and 2 classes. Among the methods applied to the three microarray data, the best classification accuracy without dimension reduction was observed with SVM as 96.14% for ovarian dataset, 94.83% for leukemia dataset and 65% for CNS dataset. The proposed hybrid method ReliefF + CNN method outperformed other approaches. It gave 98.6%, 99.86% and 83.95% classification accuracy for the datasets of ovarian, leukemia, and CNS datasets, respectively. Results shows that dimension reduction methods improved the classification accuracy of the methods of SVM and CNN.