Validity and fairness of the PISA 2018 Global Competence assessment: an argument-based evaluation via explanatory item response models

Yavuz, EMİNE

doi:10.1057/s41599-026-06979-6

Validity and fairness of the PISA 2018 Global Competence assessment: an argument-based evaluation via explanatory item response models

Yavuz E.

Humanities and Social Sciences Communications, cilt.13, sa.1, 2026 (AHCI, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 13 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1057/s41599-026-06979-6
Dergi Adı: Humanities and Social Sciences Communications
Derginin Tarandığı İndeksler: Arts and Humanities Citation Index (AHCI), Social Sciences Citation Index (SSCI), Scopus, Index Islamicus, Directory of Open Access Journals
Erciyes Üniversitesi Adresli: Evet

Özet

This study examines persistent issues concerning the validity and fairness of the PISA 2018 Global Competence assessment within Kane’s argument-based framework of validity. Focusing on scoring, generalization, extrapolation, and decision inferences, we evaluate whether item responses can be meaningfully interpreted as indicators of global competence. Using explanatory item response models applied to the Canadian sample and a two-stage multiple-imputation and meta-analytic framework, this study investigates (a) the presence, magnitude, and cross-booklet stability of booklet and testlet effects at the item level as evidence for the scoring and generalization inference; (b) the effects of theoretically relevant student-level covariates on item difficulty parameters and their stability across booklets as evidence for the extrapolation inference; and (c) the implications for the interpretation and use of scores derived from responses to global competence items by examining gender-based differential item functioning as evidence for the decision (fairness) inference. Results indicate that booklet fixed effects were small but occasionally significant, whereas testlet effects were negligible across booklet groups. Student-level predictors such as self-efficacy, respect for cultural diversity, and communicative awareness showed consistent positive effects on item difficulty across booklets, while several cognitive variables exhibited weak or unstable effects. Meta-analytic DIF analyses revealed that a limited number of items displayed moderate or high gender-related differential functioning; however, overall item functioning remained stable across booklet groups. Overall, the findings provide support for the interpretive use of global competence scores while identifying specific design- and subgroup-related considerations relevant to fairness and large-scale assessment practice.