ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation

Agraz, Melih; GÖKSÜLÜK, DİNÇER; Zhang, Peng; Choi, Bum-Rak; Clements, Richard; Choudhary, Gaurav; Karniadakis, George

doi:10.3389/fgene.2024.1442759

ML-GAP: machine learning-enhanced genomic analysis pipeline using autoencoders and data augmentation

Agraz M., GÖKSÜLÜK D., Zhang P., Choi B., Clements R. T., Choudhary G., ...Daha Fazla

Frontiers in Genetics, cilt.15, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15
Basım Tarihi: 2024
Doi Numarası: 10.3389/fgene.2024.1442759
Dergi Adı: Frontiers in Genetics
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, CAB Abstracts, EMBASE, Veterinary Science Database, Directory of Open Access Journals
Anahtar Kelimeler: RNA-seq, differential expression, mixup, machine learning, feature selection
Erciyes Üniversitesi Adresli: Evet

Özet

Introduction: The advent of RNA sequencing (RNA-Seq) has significantly advanced our understanding of the transcriptomic landscape, revealing intricate gene expression patterns across biological states and conditions. However, the complexity and volume of RNA-Seq data pose challenges in identifying differentially expressed genes (DEGs), critical for understanding the molecular basis of diseases like cancer. Methods: We introduce a novel Machine Learning-Enhanced Genomic Data Analysis Pipeline (ML-GAP) that incorporates autoencoders and innovative data augmentation strategies, notably the MixUp method, to overcome these challenges. By creating synthetic training examples through a linear combination of input pairs and their labels, MixUp significantly enhances the model’s ability to generalize from the training data to unseen examples. Results: Our results demonstrate the ML-GAP’s superiority in accuracy, efficiency, and insights, particularly crediting the MixUp method for its substantial contribution to the pipeline’s effectiveness, advancing greatly genomic data analysis and setting a new standard in the field. Discussion: This, in turn, suggests that ML-GAP has the potential to perform more accurate detection of DEGs but also offers new avenues for therapeutic intervention and research. By integrating explainable artificial intelligence (XAI) techniques, ML-GAP ensures a transparent and interpretable analysis, highlighting the significance of identified genetic markers.