Document Type: Research Article
Department of English Language, College of Languages , University of Human Developmentv, Sulaimani, Kurdistan, Iraq
The validity of large-scale assessments may be compromised, partly due to their content inappropriateness or construct underrepresentation. Few validity studies have focused on such assessments within an argument-based framework. This study analyzed the domain description and evaluation inference of the Ph.D. Entrance Exam of ELT (PEEE) sat by Ph.D. examinees (n = 999) in 2014 in Iran. To track evidence for domain definition, the test content was scrutinized by applied linguistics experts (n = 12). As for evaluation inference, the reliability and differential item functioning (DIF) of the test were examined. Results indicated that the test is biased because (1) the test tasks are not fully represented in the Ph.D. course objectives, (2) the test is best reliable for high-ability test-takers (IRT analysis), and (3) 4 items are flagged for nonnegligible DIF (logistic regression [LR] analysis). Implications for language testing and assessment are discussed and some possible suggestions are offered.