In addition, item purification was performed for each methods and results were compared in order to determine the effect of item purification. There comparisons can provide evidence for determining the best models for detecting DIF items. Results indicated that 2PL IRT model fitted best to the data for both Lord's Chi-Square method and Raju's Signed area method. Although number of items detected as DIF differed for each methods, 2 out of 22 dichotomous items in the best observed consistently across all methods, which were more likely to be answered correctly by males after controlling for overall ability.