Test Review
Test Review
Test of English for International Communication (TOEIC):
Listening & Reading
Background Information
Test purpose
The TOEIC (Test of English for International Communication) test is an English-language proficiency test for people whose native language is not English. It measures the everyday English skills of people working in an international environment. The scores indicate how well people can communicate in English with others in business, commerce, and industry. The test does not require specialized knowledge or vocabulary beyond that of a person who uses English in everyday work activities.
Length and administration
The TOEIC is a norm-referenced test, and in its current format is comprised of a Listening Section and a Reading Section. The duration of the test is two hours, and although the Listening Section is considerably shorter than the Reading Section (45 minutes compared to 75 minutes), both parts contain 100 questions and have the same range of possible scores (5 to 495 points), resulting in a test ‟s total score being between 10 and 990 points. All 200 questions on the test are multiple-choice, and the test is entirely in English (ETS, 2006a).
Scores
The listening and reading sections are scored separately. All questions are worth one point. There are 100 questions in each of the reading and listening sections, yet the maximum score for each is 495. If you got 0 out of 100, you get a 5 when it's scaled. How can that be? ETS uses some "magic" (how it's done is kept secret) to create it's scaled score. Your total score is the sum of your Listening and Reading scaled scores. The range is 10-990.
The education and / or social context
The TOEIC test has been used in large business organizations worldwide have relied on the TOEIC test to evaluate English proficiency based on a test taker’s listening and reading skills. Corporations and government agencies rely on the TOEIC test as an objective way to measure English proficiency for recruiting and promoting employees. Colleges, universities, and English-language programs use the TOEIC test to assess their incoming students’ English-language skill levels and to gauge their progress.
Test Design and Procedures
The content and format of each part of the test
The TOEIC consist of 4 section are listening , reading, speaking and writing . (We will review listening section and Reading section.)
Part I: Listening Section
Examinees listen to a variety of questions and short conversations recorded in English, then answer questions based on what they heard. The test format includes:
Photographs 20 Questions
Question-Response 30 Questions
Short Conversations 30 Questions
Short Talks 20 Questions
Part II: Reading Section
Test-takers read a variety of materials and respond at their own pace to questions based on the content. The test format includes:
Incomplete Sentences 40 Questions
Error Recognition 20 Questions
Reading Comprehension 40 Questions
The Quality of the Test
Reliability
Reliability is defined as the proportion of observed score variance that is due to true score variance. It is an indicator of the extent to which test scores will be consistent across different conditions of administration and/or administration of alternate forms of a test. The type of reliability used in the TOEIC Listening and Reading test is reported as an internal consistency measure using the KR-20 reliability index. The KR-20 reliability index assesses the extent to which all items measure the same construct. The more homogeneous the test items, the more consistently the test takers will perform. The reliability of the TOEIC Listening and Reading section scores across all forms from our norming samples has been approximately 0.90 and up.
Validity
A common criticism of standard language proficiency tests is that they fail to assess the communicative competency of the test, and are thus invalid as measures of proficiency in the true sense of the word (H.D. Brown, 2001: 387). Several scholars (Douglas, 2000; Cunningham, 2002; O‟Sullivan, 2006) have questioned the validity of the TOEIC test on these grounds. It fails to assess two of the four related aspects in Canale and Swain‟s model of communicative competence ; sociolinguistic and strategic competencies (H.D. Brown, 2007: 219-220).
Face validity In the modern, consumer-driven world, image is everything. And judging by its pervasive nature within Japanese society, and its presence in the global language testing sphere, the TOEIC test appears to enjoy a remarkably high level of face validity. The product has been extremely well-marketed. Ihara and Tsuroka (in Rebuck, 2003: 24) argue that initially marketing solely to companies and firmly establishing itself as “the Company English Test” was a masterstroke in being later able to promote itself to individuals and universities.
In 1991, Ministry of Education reforms gave universities greater freedom to decide their own curriculum and graduation conditions. Universities were also allowed to accept Ministry-recognised qualifications as university credits. Initially, only National examinations (kokka shiken) were accredited, but in 1999, the Ministry declared “TOEIC and other tests which had received wide recognition by society” deserved to be accredited too. (Rebuck, 2003: 30).
Whether TOEIC‟s exalted status in Japan led to its accreditation, or whether it was a consequence of Ministry support in the first place is a matter for debate.
TOEIC face validity is remarkably high among companies, universities and individuals. It appears that the group least satisfied with the TOEIC test are English teachers.
Construct validity There are clearly some problems in establishing construct validity for this test. The TOEIC test claims to assess overall English communication skills, yet it does so by only testing listening and speaking skills. This would imply that the TOEIC is constructed upon the theory that an individual ‟s productive language abilities are proportional to his/her receptive abilities (Miyata, 2004: 61). Hughes (2003: 31) suggests that it is unnecessary to define construct validity for direct tests of some commonsense constructs, such as „reading ability‟ or „writing ability‟, but:
Once we try to measure such an ability indirectly, however, we can no longer take for granted what we are doing. We need to look to a theory of writing ability for guidance as to the form an indirect test should take, its content and techniques.
(Hughes, 2003: 31)
Has ETS designed its test based on a flawed construct, or in the words of Moritoshi (2001: 9) has it merely “tended to skirt around the issue”? There is the distinct possibility that in spite of its advertising claims, the test designer has defined the construct of ability in a far narrower sense (i.e. the capacity to study for and pass a specific test), manipulating the construct for “scholastic, economic, and social stratification purposes” (Ross, 2008: 8). Interestingly, the test administrator seems to believe that an element of concurrent validity is sufficient to claim construct validation (Chauncey Group International Ltd., 1998: III-1).
Content validity Given the uncertain nature of the construct validity of the TOEIC test, it is unsurprising to learn that its content validity has also been called into question. Oller asserts that content validity ensures that the examinee has to “perform tasks which are genuinely the same or fundamentally similar to tasks one normally performs in exhibiting the skill or ability the tests purports to measure” (Oller, 1979: 51). If we accept this, and ETS‟ claim that the TOEIC measures English communication skills in a business context, there appears to be little content validity. Without content validity the test is unlikely to be accurate, and likely to have a harmful backwash effect (Hughes, 2003: 27). Douglas (2000:236) also states that as it is “unlikely that the reading tasks engage the test takers in any genuinely communicative behavior or in genuinely specific purpose use”, should TOEIC be even considered to be a genuine test of Language for Specific Purpose?
Washback
Sometimes referred to as „washback‟, backwash is “the effect of testing on teaching and learning”, whether good or bad (Hughes, 2003:1). As a consequence of its popularity in Japan, the TOEIC test must have a backwash effect on numerous classrooms, students and teachers. Unquestionably, some of that washback is positive. Students being motivated to study more in order to attain a higher score in the test should be considered positive. However, negative backwash is also clearly in evidence too.
In their desperation to help students attain higher TOEIC scores prior to graduation and entering the employment market, more and more universities throughout Japan are replacing English proficiency classes with TOEIC preparation classes. Communicative competence is patently not the primary objective of these classes, and instruction time is focused on learning discrete grammar items and mastering test-taking strategies (Miller, 2003).
In a commercial environment, teachers are likely to have to deal with conflicting pressures, between giving the students what the teacher perceives they need, and giving the clie