Novel Statistical Approaches for Survival Analysis of RNA-Sequencing Data


CEPHE A., Koçhan N., Zararsız G. E., Sezgin A., KARABULUT E., Zararsız G.

Current Bioinformatics, cilt.21, sa.3, ss.218-234, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 21 Sayı: 3
  • Basım Tarihi: 2026
  • Doi Numarası: 10.2174/0115748936360086250122225346
  • Dergi Adı: Current Bioinformatics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Chemical Abstracts Core, Compendex, EMBASE
  • Sayfa Sayıları: ss.218-234
  • Anahtar Kelimeler: Cancer, IPF-Lasso, priority-Lasso, RNA-seq, stacking, survival, voom
  • Erciyes Üniversitesi Adresli: Evet

Özet

Introduction/Objective: Accurate patient survival predictions are vital for effective cancer treatments. Precision medicine uses gene expression data to improve prognosis by considering genetic variability. Predicting survival in cancer patients using high-dimensional gene expression data, such as RNA-sequencing (RNA-seq), attracted much attention in recent years. However, the literature contains limited algorithms for survival modeling that account for the high dimensionality, heterogeneity, and correlated genes of RNA-seq data. This study aims to develop novel approaches for predicting survival and identifying biomarkers using RNA-seq data. Methods: Survival data of RNA-seq is first transformed into binary classification data using a stacking algorithm. Then, block-based priority-Lasso and IPF-Lasso algorithms are applied to the dataset, which includes two distinct types of variables. Additionally, sample weights obtained from the voom transformation are incorporated. Our approaches, named voomStackLasso, are tested on 12 real datasets from the TCGA database. We used Harrell's concordance index and the integrated Brier score to evaluate model performance, and the number of selected features to assess model sparsity. Results: The results indicated that the voomStackLasso algorithms demonstrated comparable or superior performance compared to other existing survival algorithms. Furthermore, we have introduced an R package called MLSeqSurv, which allows for the utilization of both established survival algorithms from the literature and voomStackLasso algorithms for RNA-seq data. Conclusion: This study introduces two new algorithms for the survival analysis of RNA-seq data. Additionally, this study has led to new research directions for applying both existing and newly developed classification algorithms to the survival analysis of RNA-seq data.