Part I
Understanding Construct Irrelevant Variance
Listen to the following podcast.
Host: Welcome to today’s episode of our podcast. Our topic for today is construct irrelevant variance in achievement testing. I am joined by Dr. Joanna an experienced language teacher. Dr. Joanna, welcome! It’s good to see you.
Dr. Joanna: Thank you. It’s great to be here.
Host: Let’s start with the basics. What exactly is construct irrelevant variance, and why is it important in assessment and especially in language testing?
Dr. Joanna: Construct irrelevant variance, or CIV for short, refers to any factors that influence test scores but are not related to the actual ability we’re trying to measure. When we create a test, especially in language learning, we’re trying to measure something very specific. This specific thing is called a “construct.” In language testing, constructs could include things like reading comprehension, grammatical accuracy, listening skills, and so on. So paying attention to construct irrelevant variance is crucial in language testing because it can lead to inaccurate assessments of a student’s true language proficiency.
Host: Can you give us some examples of CIV in language testing?
Dr. Joanna: Certainly. Say you’re assessing your students’ ability to listen to and understand a spoken dialogue in English. You design a listening test where students have to listen to a conversation between two people discussing a complex scientific topic. Now, if the students are unfamiliar with the scientific terms used, they might not do well on the test—not because they can’t understand English, but because the specialized vocabulary throws them off. This scientific jargon is a form of construct irrelevant variance. The test isn’t just measuring their listening comprehension; it’s also inadvertently measuring their knowledge of science, which isn’t what you intended. Also, in a listening test, the accent of the speaker might be unfamiliar to some test-takers, putting them at a disadvantage. In a writing test, students who are more familiar with computers might perform better on a typed essay compared to those who are less comfortable with technology.
Another common example would be test anxiety. A student might have excellent language skills, but if they’re extremely anxious during the test, their performance might not reflect their true ability. Another example could be the testing environment – if it’s too noisy or uncomfortable, it might negatively impact students’ performance.
Host: What impact could CIV have on students?
Dr. Joanna: Well, if a test includes too much CIV, it can lead to inaccurate conclusions about a student’s true abilities. If a student does poorly on a test because of irrelevant factors, they might be unfairly judged as less capable than they really are. This can affect their confidence, their motivation, and even their future opportunities.
For example, consider a student who excels at speaking and listening but struggles with writing because the test is full of tricky spelling rules or because the student is dyslexic. If the test focuses too much on spelling, it might unfairly lower the student’s score, even though their oral skills are strong.
CIV is also a concern when it comes to equity in education. Students from different backgrounds might be unfairly disadvantaged. Here’s why. Large-scale international research like the Program for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS) indicate that pupils speaking a different language at home than the language of schooling perform significantly lower. What happens is that based on these results, generalizations are made that often stigmatize minority language speakers and provide invalid test results. One example is that speaking another language at home actually causes the lower achievement (Baker 2006), a conclusion that has no empirical support. This is especially problematic when the achievement of multilingual students is measured in content-related areas is commonly measured by tests that were designed for monolinguals. Testing research has shown that a test on content (e.g. mathematics) administered to a second-language learner in the dominant language is unlikely to portray what the pupil knows and is unable to do so, because language proficiency will have an impact on the results.
Host: So how can teachers and test developers minimize CIV in their assessments?
Dr. Joanna: There are several strategies we can employ. First, we need to be aware of potential sources of CIV and design our tests to minimize them. This means that we need to design our tests with the construct clearly in mind. When we’re creating questions, we should ask ourselves: “Is this question really testing what I want it to test?” For instance, if you’re testing reading comprehension, try to ensure that the vocabulary used is appropriate for the level and doesn’t overshadow the main skill you’re assessing. Another effective strategy is pilot testing. By running a test with a small group of students before it’s officially used, you can identify any elements that might be introducing CIV. You might discover that a question is confusing or that an unexpected skill is being tested, allowing you to make adjustments before the test goes live.
Host: That makes sense. Are there any specific techniques for identifying CIV in existing tests?
Dr. Joanna: Yes, we often use statistical analyses to detect unexpected patterns in test results. For example, if we notice that students from a particular background consistently perform poorly on certain items, even when their overall language proficiency is high, it might indicate a construct irrelevant factor at play.
Host: It sounds like addressing CIV is an ongoing process. How often should language tests be reviewed and updated?
Dr. Joanna: Ideally, tests should be regularly reviewed and updated. This could be annually or even more frequently, depending on the stakes of the test and the resources available. It’s also important to gather feedback from test-takers and analyze test results to identify any potential issues. In fact, don’t underestimate the value of reviewing your tests and gathering feedback from your students. If a significant number of students struggled with a particular question, investigate why. It might be a sign that CIV was at play.
Host: As we wrap up, what advice would you give to language teachers who are developing their own classroom assessments?
Dr. Joanna: My main advice would be to always keep the purpose of the test in mind. What specific language skills are you trying to assess? Then, critically examine each item or task to ensure it’s truly measuring that skill and not being influenced by irrelevant factors. And don’t be afraid to revise and improve your assessments over time.
Host: Thank you, Dr. Joanna, for sharing your insights on construct irrelevant variance. This has been a fascinating discussion that I’m sure will be valuable for our listeners involved in language teaching and assessment.
Dr. Joanna: It’s been my pleasure! Thank you for having me.