Testing
Shohreh Bahrami Qalenoee; Jamileh Rahemi
Abstract
Over the past decades, writing assessment research has been concentrating on alternative methods with a social-oriented view of assessment, including dynamic assessment (DA). Given the lack of research juxtaposing the interventionist and interactionist DA frameworks in the area of narrative writing, ...
Read More
Over the past decades, writing assessment research has been concentrating on alternative methods with a social-oriented view of assessment, including dynamic assessment (DA). Given the lack of research juxtaposing the interventionist and interactionist DA frameworks in the area of narrative writing, this study sought to compare the effectiveness of Brown’s graduated prompts model vs. Poehner’s model in the development of one-paragraph narrative essays in terms of grammatical accuracy. The study followed a quasi-experimental design, with 15 Iranian EFL learners selected via convenient sampling from among the female students of a language institute in Tehran. The participants were then randomly divided into three groups: Interventionist group, in which mediation was based on Brown’s model in the sandwich format; interactionist group, where mediation was done using Poehner’s model in the cake format; and non-dynamic assessment (NDA) control group with no mediation involved. The research consisted of three pilot sessions and eleven sessions as the main phase. To analyze the data, both descriptive and non-parametric inferential statistics were run. The results conceded the superiority of both DA approaches to NDA, whereas no significant difference was observed between the two DA groups in their general performance on narrative tasks. However, the analysis of the number and types of required mediational moves over the DA sessions indicated the superiority of the interactionist model to interventionist framework in the development of grammatical accuracy in narrative paragraphs. The study offers some theoretical and pedagogical repercussions for educators, curriculum designers, and L2 teachers.
Testing
Mohammad Reza khodashenas; Hossein Khodabakhshzadeh; Purya Baghaei; Khalil Motallebzadeh
Abstract
Language assessment literacy has been addressed in a wealth of research. However, many studies have attempted to measure teachers’ assessment literacy, there is still a gap that prompted us to investigate the area from the EFL teachers' assessment literacy needs perspective. To accomplish the purpose, ...
Read More
Language assessment literacy has been addressed in a wealth of research. However, many studies have attempted to measure teachers’ assessment literacy, there is still a gap that prompted us to investigate the area from the EFL teachers' assessment literacy needs perspective. To accomplish the purpose, in line with the changes in classroom assessment over the past decades, this study was an attempt to develop and validate an inventory on Teachers Assessment Literacy Needs (TALNs). As the first stage, a set of items was generated through an extensive review of the relevant studies. In the quantitative phase, the developed inventory was administered to 159 English as a foreign language teachers selected through convenience sampling. An inventory construction and validation framework consisting of exploratory analyses was used to examine the construct validity of the proposed inventory. The results indicated that the inventory can be best explained by four components which are knowledge of language assessment literacy, consequences of language assessment literacy, processes of language assessment literacy and teachers’ expectations of Continuing Professional Development (CPD) programs. The TALNs inventory developed in this study aimed to help practitioners and researchers to investigate teachers’ needs in assessment literacy. Fulcher’s (2012) assessment literacy framework was drawn on as the analytic model guiding the study.
Testing
Ali Mohammadi Darabad; Gholam-Reza Abbasian; Bahram Mowlaie; Ali Asghar Rostami Abusaeedi
Abstract
AbstractClassroom performance assessment has gained prominence parallel to the multiplicity of the purposes ahead of assessment. Of many, the major controversy, which was the motive behind this study, is the incorporation of L1-based elicitation as a valid measure of second language (L2) performance ...
Read More
AbstractClassroom performance assessment has gained prominence parallel to the multiplicity of the purposes ahead of assessment. Of many, the major controversy, which was the motive behind this study, is the incorporation of L1-based elicitation as a valid measure of second language (L2) performance assessment. To shed empirical lights on this issue, this explanatory sequential mixed-methods research employed 87 Iranian intermediate EFL learners, whose L2 classroom performance was assessed through L1-based elicitation techniques. In order to validate this mechanism, multi-method mono-trait model (namely, Pearson correlation, structural equations, exploratory and confirmatory factor analysis, composite reliability and convergent validity) suggested by Henning and Mesick’s Unitary Concept of validity were applied. The results from these multiple sources of evidence yield support to their common consensus that L1-based elicitation techniques are valid measures of L2 performance assessment. The findings then offer legacy to the educational implications of L1-based mechanisms both in L2 instruction and assessment.
Testing
Sayyed Mohammad Alavi; Hossein Karami; Mohammad Hossein Kouhpaeenejad
Abstract
Measurement has been ubiquitous in all areas of education for at least a century. Various methods have been suggested to examine the fairness of education tests especially in high-stakes contexts. The present study has adopted the newly proposed ecological approach to differential item functioning (DIF) ...
Read More
Measurement has been ubiquitous in all areas of education for at least a century. Various methods have been suggested to examine the fairness of education tests especially in high-stakes contexts. The present study has adopted the newly proposed ecological approach to differential item functioning (DIF) to investigate the fairness of the Iranian nationwide university entrance exam. To this end, the actual data from an administration of the test were obtained and analyzed through both traditional logistic regression and latent class analysis (LCA) techniques. The initial DIF analysis through logistic regression revealed that 19 items (out of 70) showed either uniform or non-uniform DIF. Further examination of the sample through LCA showed that the sample is not homogeneous. LCA class enumeration revealed that three classes can be identified in the sample. DIF analysis for separate latent classes showed that three serious differences in the number of DIF items identified in each latent class ranging from zero items in latent class 3 to 43 items in latent class 2. The inclusion of the covariates in the model also showed that latent class membership could be significantly predicted from high school GPA, field of study, and acceptance quota. It is argued that the fairness of the test might be under question. The implications of the findings for the validity of the test are discussed in detail.
Testing
Mahmood Dehqan; Seyyedeh Raheleh Asadian Sorkhi
Abstract
Teacher assessment literacy plays a pivotal role in teacher education programs; however, there seems to be a lack of either assessment literacy or its implementation. Using an online assessment course, including both theoretical and practical issues, this mixed method study examined 16 teachers’ ...
Read More
Teacher assessment literacy plays a pivotal role in teacher education programs; however, there seems to be a lack of either assessment literacy or its implementation. Using an online assessment course, including both theoretical and practical issues, this mixed method study examined 16 teachers’ (8 in-service and 8 pre-service) assessment literacy and the extent to which they implement this knowledge. The quantitative part explored participants’ assessment literacy, while the qualitative phase examined the validation of the quantitative results as well as the implementation of assessment literacy in the practical realm. Data were collected via valid and reliable questionnaires, one of which was adapted from Mertler (2003) and the two others were developed by the researchers, along with a practical assessment project. The results indicated that though in-service teachers at their entry behavior were more assessment literate due to their experience, they were at lower degree of assessment literacy at their eventual behavior in comparison with pre-service teachers. The qualitative analysis explored the lack of teachers’ preference for the use of assessment literacy in their classroom practice. The study suggests the inclusion of both theoretical and practical dimensions of assessment literacy in teacher education programs and it proposes doing an in-depth investigation into the difficulties that hinder teachers from putting their theoretical assessment knowledge into practice.
Testing
Seyyed Mohammad Alavi; Mahboube Shahsavar; Mohammad Hossein Norouzi
Abstract
Computerized Dynamic Assessment (CDA), encouraged by Brown and colleagues’ graduated prompt approach, is grounded in Vygotsky’s Socio-Cultural Theory (SCT) of mind and its concept of the zone of proximal development (ZPD). It emerged to respond to the challenge of implementing DA in large ...
Read More
Computerized Dynamic Assessment (CDA), encouraged by Brown and colleagues’ graduated prompt approach, is grounded in Vygotsky’s Socio-Cultural Theory (SCT) of mind and its concept of the zone of proximal development (ZPD). It emerged to respond to the challenge of implementing DA in large classes and to meet the psychometric properties of assessment. To this end, the present study attempted to design a unique computerized dynamic assessment tool to diagnose learners’ development of pragmatic competence, specifically their knowledge of the speech acts of apology and request. To conduct the research, a number of 60 BSc students of engineering, aged 18-24, participated in the study. They had different proficiency levels, including pre-intermediate, intermediate and upper-intermediate levels. In the course of CDA, they were provided with 30 multiple choice discourse completion tests of apology and request and they were required to choose what they would say in that specific situation. The participants received pre-established meditational hints for each of the unacceptable responses, which were arranged from the most implicit to the most explicit. Finally, to diagnose learners’ development, their test performance, including their actual score, mediated score and learning potential score (LPS), was instantly displayed. Paired samples t-test showed development in learners’ mediated score. The results of the univariate analysis of variance showed that there is no interaction between mediation and proficiency level. Teachers can use this supplementary dynamic assessment tool to diagnose learners’ development of pragmatic competence.
Testing
Hamdollah Ravand; Gholamreza Rohani; Tahereh Firoozi
Abstract
The generalizability aspect of Construct validity, as proposed by Messick (1989), requires that a test measures the same trait across different samples from the same population. Differential Item functioning (DIF) analysis is a key component in the fairness evaluation of educational tests. University ...
Read More
The generalizability aspect of Construct validity, as proposed by Messick (1989), requires that a test measures the same trait across different samples from the same population. Differential Item functioning (DIF) analysis is a key component in the fairness evaluation of educational tests. University entrance exam for the candidates who seek admission into master's English programs (MEUEE) at Iranian state universities is a very high stakes test whose fairness is a promising line of research. The current study explored gender and major DIF in the general English (GE) section of the MEUEE using multiple-indicators multiple-causes (MIMIC) structural equation modelling. The data of all the test takers (n=21,642) who took the GE section of the MEUEE in 2012 were analyzed with Mplus. To determine whether an item is flagged for DIF both practical and statistical significance were considered. The results indicated that 12 items were flagged for DIF in terms of statistical significance. However, only 5 items showed practical significance. The items flagged for DIF alert the test developers and users to potential sources of construct-irrelevant variance in the test scores which may call into question comparison of the test takers’ performance, especially when the tests are used for selection purposes.
Testing
Alireza Ahmadi
Abstract
The studies conducted so far on the effectiveness of resolution methods including the discussion method in resolving discrepancies in rating have yielded mixed results. What is left unnoticed in the literature is the potential of discussion to be used as a training tool rather than a resolution method. ...
Read More
The studies conducted so far on the effectiveness of resolution methods including the discussion method in resolving discrepancies in rating have yielded mixed results. What is left unnoticed in the literature is the potential of discussion to be used as a training tool rather than a resolution method. The present study addresses this research gap by exploring the data coming from rating behaviors of 5 Iranian raters of English. Qualitative analysis of the data indicated that the discussion method can serve the function of training raters. It helped raters rate more easily, quickly and confidently. Furthermore, it helped them improve their understanding and application of the rating criteria and enabled them justify their scoring decisions. Many-faceted Rasch analysis also supported the beneficial effects of discussion in terms of improvement in raters’ severity, consistency in scoring, and the use of scale categories. The findings provide insight into the potential of discussion to be used as a training tool especially in EFL contexts in which lack of access to expert raters can be an obstacle to holding training programs. The author argues for future studies to focus on how discussion may function depending on the rating scale used.