Evaluating the Impact of Distance Metrics on K-means Clustering Performance with Geospatial Data


Ay M. M., Özbakır L.

EditorsProceedings of International Conference on Data, Electronics and Computing ICDEC, Nibaran Das,Esra Kahya Ozyirmidokuz,Swagata Mandal,Lale Özbakır,Debotosh Bhattacharjee, Editör, Springer Nature, Singapore, ss.1-493, 2025

  • Yayın Türü: Kitapta Bölüm / Araştırma Kitabı
  • Basım Tarihi: 2025
  • Yayınevi: Springer Nature
  • Basıldığı Şehir: Singapore
  • Sayfa Sayıları: ss.1-493
  • Editörler: Nibaran Das,Esra Kahya Ozyirmidokuz,Swagata Mandal,Lale Özbakır,Debotosh Bhattacharjee, Editör
  • Erciyes Üniversitesi Adresli: Evet

Özet

Abstract Clustering algorithms are fundamental in data mining, machine learning, and statistical analysis, with K-means being one of the most popular due to its simplicity and efficiency. This study examines the impact of using different distance metrics (Euclidean and Vincenty) on the performance of the K-means clustering algo-rithm when applied to geographical data. While Euclidean distance is widely used for its straightforward computation, it may not be ideal for clustering geographical loca-tions due to the Earth’s curvature. Vincenty’s formula, which accounts for the Earth’s ellipsoid shape, offers a potentially more accurate alternative. We conducted experi-ments using a dataset of 69,902 geographical points, evaluating the clustering perfor-mance based on Sum of Squared Errors (SSE), Davies-Bouldin Index (DBI), and Silhouette Index. Our results indicate that although Vincenty’s formula theoretically provides more precise distance measurements depending on the earth shape, the prac-tical differences in clustering performance between Euclidean and Vincenty distances are minimal. Statistical analysis using the Mann–Whitney U test demonstrated that the observed differences lacked statistical significance.