DeepGum: Deep feature transfer for gut microbiome analysis using bottleneck models


Çiftcioğlu U. G. E., Nalbanoglu Ö. U.

BIOMEDICAL SIGNAL PROCESSING AND CONTROL, cilt.91, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 91
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.bspc.2024.105984
  • Dergi Adı: BIOMEDICAL SIGNAL PROCESSING AND CONTROL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, EMBASE, INSPEC
  • Anahtar Kelimeler: Autoencoders, Deep learning, Microbiome data, Taxonomic classification, Transfer learning, U-nets
  • Erciyes Üniversitesi Adresli: Evet

Özet

The intricate relationship between gut microbiome composition and gastrointestinal diseases has been a focal point of numerous scientific investigations. In terms of sample points, the gut metagenome data has typically been produced in the orders of tens, hundreds or mostly in thousands per cohort. As there are practical and technical limitations resulting in this process, the small sample size-large dimension problem brings along hypothesis testing difficulties. To overcome these issues, by using autoencoder and U-net models for nonlinear dimensionality reduction, we propose the implementation of deep feature transfer for gut microbiome analysis (DeepGum). In this study, we investigated whether modeling the taxonomic landscape of gut microbiome would lead to improvement in disease classification, in the context of certain gastrointestinal diseases. We have shown that with DeepGum, reconstructive unsupervised learning using bottleneck models, trained on a comprehensive combined dataset consisting of 9 different datasets, results in superior classification performance given the existing methods. The proposed model obtained Area under the Receiver Operating Characteristics Curve (AUC) values between 0.74 to 0.955 on publicly available microbiome datasets. We also showed that via transfer learning, our downsampling experiments demonstrated that for extremely small subcohorts (e.g. 5 to 10 samples), it is possible to discriminate disease cases from healthy controls with some weak (i.e. around 0.6 AUC) but statistically significant classifications.