Fine-tuning Large Language Models for Turkish Flutter Code Generation

Uluırmak, BUĞRA; KURBAN, RİFAT

doi:10.35377/saucis...1722643

Fine-tuning Large Language Models for Turkish Flutter Code Generation

Uluırmak B. A., KURBAN R.

Sakarya University Journal of Computer and Information Sciences, cilt.8, sa.4, ss.637-650, 2025 (Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 8 Sayı: 4
Basım Tarihi: 2025
Doi Numarası: 10.35377/saucis...1722643
Dergi Adı: Sakarya University Journal of Computer and Information Sciences
Derginin Tarandığı İndeksler: Scopus, Central & Eastern European Academic Source (CEEAS), Directory of Open Access Journals, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.637-650
Anahtar Kelimeler: Code generation, Fine-tuning, Flutter, Large language models, Low-resource languages
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Erciyes Üniversitesi Adresli: Hayır

Özet

The rapid advancement of large language models (LLMs) for code generation has largely centered on English programming queries. This paper focuses on a low-resource language scenario, specifically Turkish, in the context of Flutter mobile app development. Two representative LLMs (a 4B-parameter multilingual model and a 3B code-specialized model) on a new Turkish question-and-answer dataset for Flutter/Dart are fine-tuned in this study. Fine-tuning with parameter-efficient techniques yields dramatic improvements in code generation quality: Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L), Metric for Evaluation of Translation with Explicit Ordering (METEOR), Bidirectional Encoder Representations from Transformers Score (BERTScore), and CodeBLEU scores show significant increases. The rate of correct solutions increased from ~30–70% (for base models) to 80–90% after fine-tuning. The performance trade-offs between models are analyzed, revealing that the multilingual model slightly outperforms the code-focused model in accuracy after fine-tuning. However, the code-focused model demonstrates faster inference speeds. These results demonstrate that even with very limited non-English training data, customizing LLMs can bridge the gap in code generation, enabling high-quality assistance for Turkish developers comparable to that for English. The dataset was released on GitHub to facilitate further research in multilingual code generation.