The generalizability aspect of Construct validity, as proposed by Messick (1989), requires that a test measures the same trait across different samples from the same population. Differential Item functioning (DIF) analysis is a key component in the fairness evaluation of educational tests. University entrance exam for the candidates who seek admission into master's English programs (MEUEE) at Iranian state universities is a very high stakes test whose fairness is a promising line of research. The current study explored gender and major DIF in the general English (GE) section of the MEUEE using multiple-indicators multiple-causes (MIMIC) structural equation modelling. The data of all the test takers (n=21,642) who took the GE section of the MEUEE in 2012 were analyzed with Mplus. To determine whether an item is flagged for DIF both practical and statistical significance were considered. The results indicated that 12 items were flagged for DIF in terms of statistical significance. However, only 5 items showed practical significance. The items flagged for DIF alert the test developers and users to potential sources of construct-irrelevant variance in the test scores which may call into question comparison of the test takers’ performance, especially when the tests are used for selection purposes.