Machine Learning Models With Hyperparameter Optimization for Voice Pathology Classification on Saarbrücken Voice Database


Gulsen P., Gulsen A., ALÇI M.

Journal of Voice, 2025 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Publication Date: 2025
  • Doi Number: 10.1016/j.jvoice.2024.12.009
  • Journal Name: Journal of Voice
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Periodicals Index Online, CINAHL, Communication Abstracts, Linguistics & Language Behavior Abstracts, MEDLINE, Music Index, Music Periodicals Database, RILM Abstracts of Music Literature
  • Keywords: Pathological voice classification—Machine learning—Mel frequency cepstral coefficients—Saarbrücken voice database
  • Erciyes University Affiliated: Yes

Abstract

Early diagnosis and referral are crucial in the treatment of voice disorders. Contemporary investigations have indicated the efficacy of voice pathology detection systems in significantly contributing to the evaluation of voice disorders, facilitating early diagnosis of such pathologies. These systems leverage machine learning methodologies, widely applied across diverse domains, and exhibit particular potential in the realm of voice pathology classification. However, machine learning models and performance metrics employed in these studies vary significantly, making it challenging to determine the optimal model for voice pathology classification. In this study, healthy and pathological voices were classified with state-of-the-art machine learning models, and the performance results of the models were compared. The voice samples employed in our research were sourced from the Saarbrücken Voice Database, a reputable German database. Feature extraction from voice signals was conducted using the Mel Frequency Cepstral Coefficients method. To assess and enhance the models’ performance adequately, we employed hyperparameter optimization and implemented a 10-fold cross-validation approach. The outcomes revealed that the support vector machine model exhibited the highest accuracy, achieving 99.19% and 99.50% accuracies in the classification of male and female voice pathologies, respectively.