One important aspect of practicality that testing researchers have pointed out is that a test ought to have what Oller (1979:52) called instructional value, that is, "it ought to be possible to use the test to enhance the delivery of instruction in student populations." Testing and teaching are interrelated, as we shall see later in this chapter. Teachers need to be able to make clear and useful interpretations of test data in order to understand their students better. A test that is too complex or too sophisticated may not be of practical use to the teacher.
Reliability
A reliable test is a test that is consistent and dependable. Sources of unreliability may lie in the test itself or in the scoring of the test, known respectively as test reliability and rater (or scorer) reliability. If you give the game test to the game subject or matched subjects on two different occasions, the test itself should yield similar results; it should have test reliability. A test of skating ability, for example, should be reasonably consistent from one day to the next. However, if one skating test is conducted on bumpy ice and another on smooth ice, the reliability of the test - of one aspect of the test, at least - is suspect. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the testing room, some students in the room were prevented from hearing the tape accurately. That was a clear case of unreliability. Some- times a test yields unreliable results because of factors beyond the control of the test writer, such as illness, a "bad day," or no sleep the night before.
Scorer reliability is the consistency of scoring by two or more scorers. If very subjective techniques are employed in the scoring of a test, one would not expect to find high scorer reliability. A test of authenticity of pronunciation in which the scorer is to assign a number between one and five might be unreliable if the scoring directions are not clear. If scoring directions are clear and specific as to the exact details the judge should attend to, then such scoring can become reasonably consistent and dependable. In tests of writing skills scorer reliability is not easy to achieve since writing proficiency involves numerous traits that are difficult to define. But as Brown and Bailey (1984) pointed out, the careful specification of an analytical scoring instrument can increase scorer reliability.
Validity
By far the most complex criterion of a good test is validity, the degree to which the test actually measures what it is intended to measure. A valid test of reading ability is one that actually measures reading ability and not 20/20 vision, previous knowledge in a subject, or some other variable of questionable relevance. To measure writing ability, one might conceivably ask students to write as many words as they can in 15 minutes, then simply count the words for the final score. Such a test would be practical and reliable; the test would be easy to administer, and the scoring quite dependable. But it would hardly constitute a valid test of writing ability unless some considerations were given to the communication and organization of ideas, among other factors. Some have felt that standard language proficiency tests, with their context-reduced, CALP-oriented language and limited stretches of discourse, are not valid measures of language "proficiency" since they do not appear to tap into the communicative competence of the learner. There is good reasoning behind such criticism (Duran 1985); nevertheless, what such proficiency tests lack in validity, they gain in practicality and reliability. We will return to the question of large-scale proficiency testing in a later section of this chapter.
How does one establish the validity of a test? Statistical correlation with other related measures is a standard method. But ultimately, validity can only be established by observation and theoretical justification. There is no final, absolute, and objective measure of validity. We have to ask questions that give us convincing evidence that a test accurately and sufficiently measures the testee for the particular purpose, or objective, or criterion, of the test.
Zaujímavosti o referátoch
Ďaľšie referáty z kategórie
Methodology Reader
Dátum pridania: | 28.09.2005 | Oznámkuj: | 12345 |
Autor referátu: | groovy_luvah | ||
Jazyk: | Počet slov: | 25 072 | |
Referát vhodný pre: | Vysoká škola | Počet A4: | 85.7 |
Priemerná známka: | 2.95 | Rýchle čítanie: | 142m 50s |
Pomalé čítanie: | 214m 15s |
Zdroje: Lightbown,P., Spada,P.:FACTORS AFFECTING SECOND LANGUAGE LEARNING