other
ASSESSING SPEAKING

From a pragmatic view of language
performance, listening, and speaking are almost always closely interrelated.
While it is possible to isolate some listening performance types (see Chapter
6), it is very difficult to isolate oral production tasks that do not directly involve
the interaction of aural comprehension.
Only in limited contexts of speaking (monologues, speeches, or telling stories
and reading aloud) can we assess oral language without the aural participation
of an interlocutor.
While speaking is a productive skill
that can be directly and empirically observed, those observations are
invariably colored by the accuracy and effectiveness of a test taker's
listening skill, which necessarily compromises the reliability and validity of
an oral production test. How do you know for certain that speaking score is exclusively
a measure of oral production without the potentially frequent clarifications of
an interlocutor? This interaction of speaking and listening challenges the
designer of an oral production test to tease apart, as much as possible, the
factors accounted for by aural intake.
Another challenge is the design of
elicitation techniques. Because most speaking is the product of creative
construction of linguistics strings, the speaker makes choices of lexicon,
structure, and discourse. If your goal is to have test-takers demonstrate certain
spoken grammatical categories, for example, the stimulus you design must elicit
those grammatical categories in ways that prohibit the test-taker from avoiding
or paraphrasing and thereby dodging production of the target form.
As tasks become more and more open
ended, the freedom of choice given to test-takers creates a challenge in
scoring procedures. In receptive performance, the elicitation stimulus can be
structured to anticipate predetermined response and only those responses. In
productive performance, the oral and written stimulus must be specific enough
to elicit output within an expected range of performance such that scoring or
rating procedures apply appropriately. For example, in a picture-series task,
the objective of which is to elicit a story in a sequence of events,
test-takers could opt for a variety of plausible ways to tell the story, all of
which might be equally accurate. How can such disparate responses be evaluated?
One solution is o assign not one but several scores for each response each
score representing one of several traits (pronunciation, fluency, vocabulary
use, grammar, comprehensibility, etc).
All of these issues will be
addressed in this chapter as we review types of spoken language and micro- and
macroskills of speaking, then outline numerous tasks for assessing speaking
(a) Basis Types Of Speaking
We cited four categories of listening performance assessment tasks. A
similar taxonomy emerges for oral production.
1. Imitative. At
one end of continuum of type of speaking performance is the ability to simply
parrot back (imitate) a word or phrase or possibly a sentence. While this is a
purely phonetic level of oral production, a number of prosodic, lexical, and
grammatical properties of language may be included in the criterion performance.
We are -interested only in what is traditionally labeled "pronunciation";
no inferences are made about the test-taker's ability to understand or convey
meaning or to participate in an interactive conversation. The only role of listening
her is in the short stretch of language that must be imitated.
2. Intensive. A second type of speaking frequently
employed in assessment contexts is the production of short stretches of oral language
designed to demonstrate competence in a narrow band of grammatical, phrasal,
lexical, or phonological relationships (such as prosodic elements-intonation,
stress, rhythm, juncture). The speaker must be aware of semantic properties in
order to be able to respond, but interaction with an interlocutor or test
administrator is minimal at best. Examples of intensive assessment task include
directed responses tasks including simple sequences; translation up to simple
sentence level.
3. Responsive. Responsive assessment tasks include interaction
and test comprehension but at the somewhat limited level of very short
conversations, standard greetings and small talk, simple requests and comments,
and the like. The stimulus is almost always spoken prompt (in order to preserve
authenticity), with perhaps is only one or two follow-up questions or retorts:
A. Mary: Excuse me, do you have the
time?
Dough: Yeah Nine-fifteen.
B. Tim: What is the most urgent
environmental
Problem today?
Sheila: I would say massive
deforestation.
C. Jeff: Hey, Stef, how's it going?
Stef: Not bad, and yourself?
Jeff: I'm good.
Stef: Cool. Okay, gotta go.
4. Interactive. The difference between responsive and interactive
speaking is the length and complexity of the interaction, which sometimes
includes multiple exchanges and/or multiple participants. Interaction can take
the two forms of transactional language which has the purpose of exchanging
specific information, or interpersonal, exchanges, which have the purpose of
maintaining social relationships. (In the three dialogues cited above, A and B were
transactional, and C was interpersonal.) In interpersonal exchanges, oral
production can become pragmatically complex with the need to speak in a casual
register and use colloquial language, ellipsis, slang, humor, and other
sociolinguistic conversations. 5. Extensive (monologue). Extensive oral production tasks include
speeches, oral presentations, and story-telling, during which the opportunity
for oral interaction from listeners is either highly limited (perhaps to
nonverbal responses) or ruled out altogether. Language style is frequently more
deliberative (planning is involved) and formal for extensive tasks, but we
cannot rule out certain informal monologues such as casually delivered speech
(for example, my vacation in the mountains, a recipe for outstanding pasta
primavera, recounting, the plot of a novel or movie).
(b) Micro- And Macroskills Of
Speaking
A list of listening micro- and macroskills enumerated various components of
listening that make up criteria for assessment. A similar list of speaking
skills can be drawn up for the same purpose: to serve as taxonomy of skills
from which you will select one or several that will become the objective(s) of
an assessment task. The microskills refer to producting the smaller chunks of
language such as phonemes, morphemes, words, collocations and phrasal units.
The macroskills imply the speaker's_ focus on the larger elements: fluency,
discourse, function, style, cohesion, nonverbal communication, and strategic
options. The micro-and macroskills total roughly 16 different objectives to
assess in speaking.
Micro- and macrosklls of oral production
Micro skills
1. Produce differences
among English phonemes and allophonic variants.
3. Produce chunks of language of different
lengths.
4. Produce English stress patterns, words in
stressed and unstressed positions, rhythmic structure and intonation contours.
5. Produce reduced forms of words and phrases.
6. Use an adequate number of lexical units
(words) to accomplish pragmatic purposes.
7. Produce fluent speech at different rates of
delivery
8. Monitor one's own oral production and use
various strategic devices-pauses, tillers, self-corrections, backtracking-to enhance
the clarity of the message.
9. Use grammatical word classes (nouns, verbs,
etc), system (e.g., tense, agreement, pluralization), word order, patterns, rules
and elliptical forms
10. Produce speech in natural constituents: in
appropriate phrases, pause groups, and sentence constituents
11. Express a particular meaning in different
grammatical forms
12. Use cohesive devices in spoken discourse
Macroskills
13. Appropriately
accomplish communicative functions according to situations, participants, and
goals.
14. Use appropriate styles, registers,
implicature, redundancies, pragmatic conventions, conversations rules, floor
keeping, and-yielding, interrupting, and other sociolinguistics features face
to face conversations.
15. Convey links and
connections between events and communicate such relations as local and
peripheral ideas, events and feelings, new information and giving information, generalization
and exemplification
16. Convey facial features and kinesics, body
language and other nonverbal cues along with verbal language.
17. Develop and use a battery of speaking
strategies, such as emphasizing key words, rephrasing, providing a context for interpreting
the meaning of words, appealing for help, accurately assessing how well your
interlocutor is understanding you.
As you consider designing tasks for assessing spoken language, these skills
can act as a checklist of objectives. While the macroskills have the appearance
of being more complex than the macroskills, both contain ingredients of
difficulty, depending on the stage and context o the test-taker.
There is such an array of oral production tasks that a complete treatment
is almost impossible within the confines of one chapter in this book Below is a
consideration of the most common techniques with brief allusions to related
tasks. As already noted in the introduction to this chapter, consider three
important issues as you set out to design tasks:
1. No speaking
task is capable of isolating the single skill of oral production. Concurrent
involvement of the additional performance of aural comprehension, and possibly
reading, is usually necessary.
2. Eliciting the
specific criterion you have designated for a task can be tricky because beyond
the world level, spoken language offers a number of productive options to
test-takers. Make sure your elicitation prompt achieves its aims as closely as
possible.
3. Because of the
above two characteristics of oral production assessment, it is important to
carefully specify scoring procedures for a response so that ultimately you
achieve as high a reliability index as possible.
(c) Designing Assessment Tasks: Imitative
Speaking
You may be surprised to see the inclusion of simple phonological imitation
in a consideration of assessment of oral production. After all, endless
repeating of words, phrases, and sentences was the province of the
long-since-discarded Audiolingual Method, and in an era of communicative
language teaching, many believe that non-meaningful imitation of sounds is
fruitless. Such opinions have faded in recent years as we discovered that an
overemphasis on fluency can sometimes lead to the decline of accuracy in
speech. And so we have been paying more attention to pronunciation, especially
suprasegmentals, in an attempt to help learners be more comprehensible.
An occasional phonologically focused repetition task is warranted as long as
repetition tasks are not allowed to occupy a dominant role in an overall oral
production assessment, and as long as you artfully avoid a negative washback
effect. Such tasks range from word level to sentence level, usually with each
item focusing on a specific phonological criterion. In a simple repetition
task, test-takers repeat the stimulus, whether it is a pair of words, a
sentence, or perhaps a question (to test for intonation production).
Word repetition task
Test-takers hear: Repeat after me:
beat [pause ] bit [pause ]
bat [ pause ] vat [ pause ] etc.
I bought a boat yesterday.
The glow of the candle is growing.
etc.
When did they go on vacation?
Do you like coffee? etc.
Test-takers repeat the stimulus.
A variation on such a task prompts test-takers with a brief written
stimulus which they are to read aloud. (In the section below on intensive
speaking, some tasks are described in which test-takers read aloud longer
texts.) Scoring specifications must be clear in order to avoid reliability
breakdowns. A common form of scoring simply indicates a two- or three-point
system for each response.
TaskScoring scale for repetition ts
2 acceptable pronunciation
1 comprehensible, partially correct
pronunciation
0 silence, seriously incorrect
pronunciation
The longer the stretch of language, the more possibility for error and
therefore the more difficult it becomes to assign a point system to the text.
In such a case, it may he imperative to score only the criterion of the task.
For example, in the sentence "When did they go on vacation?" since
the criterion is falling intonation for wh-questions, points should be awarded
regardless of any mispronunciation.
PHONEPASS TEST
An example of a popular test that uses imitative (as well as intensive)
production tasks is PhonePass, a widely used, commercially available speaking
test in many countries. Among a number of speaking tasks on the test,
repetition of sentences (of 8 to 12 words) occupies a prominent role. It is
remarkable that research on the PhonePass test has supported the construct validity
of its repetition tasks not ju st for a test-taker's phonological ability but
also for discourse and overall oral production ability (Townshend et al., 1998;
Bernstein et al., 2000; Cascallar & Bernstein, 2000).
The PhonePass test elicits computer-assisted oral production over a
telephone. "Test-takers read aloud, repeat sentences, say words, and
answer questions. With a downloadable test sheet as a reference, test-takers
are directed to telephone a designated number and listen for directions. The
test has five sections. PhonePass* test specifications
Part A:
Test-takers read aloud selected sentences from among those printed on the
test sheet. Examples:
1. Traffic is a
huge problem in Southern California.
2. The endless
city has no coherent mass transit system.
3. Sharing rides
was going to be the solution to rush-hour traffic.
4. Most people
still want to drive their own cars, though.
Part B:
Test-takers repeat sentences dictated over the phone. Examples: "Leave
town on the next train"
(d) Designing Assessment Tasks:
Intensive Speaking
At the intensive level, test-takers are prompted to produce short stretches
of discourse (no more than a sentence) through which they demonstrate
linguistic ability at a specified level of language. Many tasks are “cued”
tasks in that they lead the test-taker into a narrow band of possibilities.
Parts C and D of the PhonePass test fulfill the criteria of intensive tasks
as they elicit certain expected forms of language. Antonyms like high
and low , happy
and sad are prompted so that the automated scoring mechanism anticipates
only one word. The either/or task of Part D fulfills the same criterion.
Intensive tasks may be also be described as limited response tasks (Madsen, 1983),
or mechanical tasks (Underhill, 1987), or what classroom pedagogy would label
as controlled responses.
Directed Response Tasks
In this type of task, the test administrator elicits a particular grammatical
form or a transformation of a sentence. Such tasks are clearly mechanical and
not communicative, but they do require minimal processing of meaning in order
to produce the correct grammatical output.
Directed response
Test-takers hear: Tell me he went
home.
Tell me that you like rock music.
Tell me that you aren't interested in tennis.
Tell him to come to my office at noon.
Remind him what time it is.
Read-Aloud Tasks
Intensive reading-aloud tasks include reading beyond the sentence level up
to a paragraph or two. This technique is easily administered by selecting a
passage that incorporates test specs and by recording the test-taker's output;
the scoring is relatively easy because all of the test-taker's oral production
is controlled. Because of the results of research on the PhonePass test,
reading aloud may actually be a surprisingly strong indicator of overall oral production
ability. For many decades, foreign language programs have used reading passages
to analyze oral production. Prator's (1972)
Manual of American
English Pronunciation included a
“diagnostic passage” of about 150 words that students could read aloud into a tape
recorder. Teachers listening to the recording would then rate students on a
number of phonological factors (vowels, diphthongs, consonants, consonant
clusters, stress, and intonation) by completing a two-page diagnostic checklist
on which all errors or questionable items were noted. These checklists
ostensibly offered direction to the teacher for emphases in the course to come.
An earlier form of the Test of Spoken English (TSE@, see below) incorporated
one read-aloud passage of about 120 to 130 words with a rating scale for
pronunciation and fluency. The following passage is typical:
Read-aloud stimulus, paragraph length
Despite the decrease in size-and, some would say, quality of our cultural
world, there still remain strong differences between the usual British and
American writing styles. The question is, how do you get your message across?
English prose conveys its most novel ideas as if they were timeless truths,
while American writing exaggerates; if you believe half of what is said, that's
enough. The former uses understatement; the latter, overstatement. There are
also disadvantages to each characteristic approach. Readers who are used to
being screamed at may not listen when someone chooses to whisper politely. At
the same time, the individual who is used to a quiet manner may reject a series
of loud imperatives.
The scoring scale for this passage provided a four-point scale for pronunciation
and for fluency, as shown in the box below.
Test of Spoken English scoring scale (1987, p. 10)
Pronunciation:
Points:
0.0-0.4 Frequent
phonemic errors and foreign stress and Intonation patterns that cause the
speaker to he unintelligible.
0.5-1.4 Frequent
phonemic errors and foreign stress and Intonation patterns that cause the
speaker to be occasionally unintelligible.
1.5-2.4 Some
consistent phonemic errors and foreign stress and intonation patterns, but the speaker is
intelligible.
2.5-3.0 Occasional
non-native pronunciation errors, but the speaker is always intelligible.
Fluency:
Points:
0.0-0.4 Speech is
so halting and fragmentary or has such a non-native flow that intelligibility is virtually impossible.
0.5-1.4 Numerous
non-native pauses and/or a non-native flow that interferes with intelligibility.
1.5-2.4 Some
non-native pauses but with a more nearly native flow that the pauses do not
interfere with intelligibility.
2.5-3.0 Speech is smooth and effortless, closely approximating that of a native speaker.
Such a rating list does not indicate how to gauge intelligibility, which is
mentioned in both lists. Such slippery terms remind us that oral production
scoring, even with the controls that reading aloud offers, is still an inexact
science. Underhill (1987. op. 77-78) suggested some variations on the task of
simply reading a short passage.
• reading a scripted
dialogue, with someone else reading the other part
• reading
sentences containing minimal pairs, for example:
Try not to heat/hit the pan too much.
The doctor gave me a bill/pill.
• reading
information from a table or chart
If reading aloud shows certain practical advantages (predictable output,
practicality, reliability in scoring), there are several drawbacks to using
this technique for assessing oral production. Reading aloud is somewhat
inauthentic in that we seldom read anything aloud to someone else in the real
world, with the exception of a parent reading to a child. occasionally sharing
a written story with someone, or giving a scripted oral presentation. Also,
reading aloud calls on certain
specialized oral abilities that may not indicate one's pragmatic ability
to communicate orally in face-to-face contexts. You should therefore employ
this technique with some caution, and certainly supplement it as an assessment task
with other, more communicative procedures.
Sentence/Dialogue Completion Tasks and Oral Questionnaires
Another technique for targeting intensive aspects of language requires
test-takers to read dialogue in which one speaker's lines have been omitted.
Test-takers are first given time to read through the dialogue to get its gist
and to think about appropriate lines to fill in. Then as the tape, teacher, or
test administrator produces one part orally, the test-taker responds. Here's an
example.
Dialogue completion task
Test-takers read (and then hear):
In a department store:
Salesperson: May I help you?
Customer: ................................
Salesperson: Okay, what size do you
wear?
Customer: ................................
Salesperson: Hmmm. How about this
green sweater here?
Customer: ................................
Salesperson: Oh. Well, if you don't
like green, what color would you like?
Customer: ................................
Salesperson: How about this
one?
Customer: ................................
Salesperson: Great!
Customer: ................................
Salesperson: It's on sale today for
$39.95.
Customer: ................................
Salesperson: Sure, we take Visa,
MasterCard, and American Express.
Customer: ................................
Test-takers respond with appropriate lines.
An advantage of this technique lies
in its moderate control of the output of the test-taker. While individual
variations in responses are accepted, the technique taps into a learner's
ability to discern expectancies in a conversation and to produce
sociolinguistically correct language. One disadvantage of this technique is its
reliance on literacy and an ability to transfer easily from written to spoken English.
Another disadvantage is the contrived, inauthentic nature of this task:
Couldn't the same criterion performance be elicited in a live interview in
which an impromptu role-play technique is used?
Perhaps more useful is a whole host of shorter dialogues of two or three
lines, each of which aims to elicit a specified target. In the following
examples, somewhat unrelated items attempt to elicit the past tense, future
tense, yes/no question formation, and asking for the time. Again, test-takers
see the stimulus in written form.
In designing such questions for test-takers, it's important to make sure
that you know why you are asking the
question. Are you simply trying to elicit strings of language output to gain a
general sense of the test-taker's discourse competence? Are you combining discourse
and grammatical competence in the same question? In each question just one in a
whole set of related questions? Responsive questions may take the following
forms:
Question eliciting open-ended responses
Test-takers hear:
1. What do you think about the
weather today?
2. What do you like about the
English language?
3. Why did you choose your academic
major?
4. What kind of strategies have you used to help you learn English?
5. a. Have you ever been to the
United States before?
b. What other countries have you visited?
c. Why did you go there? What did you like best about lt?
d. If you could go back, what would you like to do or
see?
e. What country would you like to visit next, and why?
Test-takers respond with a few sentences at most.
Notice that question #5 has five situationally linked question that may
vary slightly depending on the test-takers’ response to a previous question.
Oral interaction with attest administrator often involves the letter
forming all the question. The flip side of this usual concept of question-and-answer
tasks to elicit from the test-takers. To
assess the test-takers ability to produce question, prompts such as this can be
used:
Elicitation of question from the test-takers
Test-takers hear:
• Do you have any
question for me?
• Ask me about my
family or job interest.
• If you could
interview the president or prime minister of your country, what would you ask
the person?
Test-takers respond with question.
A potentially tricky from of oral production assessment involves more than
one test-takers with an interviewer, which is discussed later in this chapter.
With two students in a interviewer context, both test-takers can ask question
of each other.
Directed response tasks
Test-takers see:
Interviewer : What did you do last weekend?
Test-taker : ...................................?
Interviewer : What will you do afte r you graduate from
this program?
Test-taker ...................................?
Interviewer : I was in Japan for two weeks.
Test-taker : ...................................?
Interviewer : It’s ten-thirty.
One could contend that performance on these items is responsive rather than
intensive. True, the discourse involves responses, but there is a degree of
control here that predisposes the test-taker to respond with certain expected
forms. Such arguments underscore the fine lines of distinction between and
among the selected live categories.
It could also be argued that such techniques are nothing more than a
written form of questions that might otherwise (and more appropriately) be part
of a standard or-al interview. True, but the advantage that the written form
offers is to pro-vide a little more time for the test-taker to anticipate an
answer, and it begins to remove the potential ambiguity created by aural
misunderstanding. It helps to unlock the almost ubiquitous link between
listening and speaking performance.
Underhill (1987) describes yet another technique that is useful for
controlling the test-taker’s output: form-filling, or what I might rename
"oral questionnaire." Here the test-taker sees a questionnaire that
asks for certain categories of information (personal data, academic
information, job experience, etc.) and supplies the information orally
(e) Designing Assessment Tasks: Responsive Speaking
Assessment of responsive tasks involves brief interactions with an interlocutor,
differing from intensive tasks in the increased creativity given to the
test-taker and from interactive tasks by
the somewhat limited length of utterances.
Question and Answer
Question-and-answer tasks can consist of one or two questions from an
interviewer, or they can make up a portion of a whole battery of questions and
prompts in an oral interview They can vary from simple questions like
"What is this called in English?" to complex questions like-What are
the steps governments should take, if any, to stem the rate of deforestation in
tropical countries?" The first question is intensive in its purpose; it is
a display question intended to elicit a predetermined correct response. We have
already looked at some of these types of questions in the previous section.
Questions at the responsive level tend to be genuine referential questions in
which the test-taker is given more opportunity to produce meaningful language
in response.
(f) Designing
Assessment Tasks: Interactive Speaking
The final two categories of oral production assessment (interactive and
extensive speaking) include tasks that involve relatively long stretches of
interactive discourse (interviews, role plays, discussions, games) and tasks of
equally long duration but that involve less interaction (speeches, telling
longer stories, and extended explanations and translations).The obvious
difference between the two sets of tasks is the degree of interaction with an interlocutor.
Also, interactive tasks are what some would describe as interpersonal, while
the final category includes more transac-tional speech events.
Interview
When "oral production assessment" is mentioned, the first thing
that comes to mind. Is an oral Interview: a test administrator and a test taker
alt down In a direct face to face exchange and proceed through a protocol of
questions and directives, The interview, which may be tape-recorded for
re-listening, is then scored on one or more parameters such as accuracy in
pronunciation and/or grammar, vocabulary usage, fluency,
sociolinguistic/pragmatic appropriateness, task accomplishment, and even
comprehension.
Interviews can vary in length from perhaps five to forty-five minutes,
depending on their purpose and context. Placement interviews, designed to get a
quick spoken sample from a student in order to verify placement into a course.
May need only five minutes if the interviewer is trained to evaluate tits
output accurately. Longer comprehensive interviews such as the OPI (sec the next
section) arc designed to cover predetermined oral production contexts and may
require the better part of an hour.
Every effective interview contains a number of mandatory stages. Two
decades ago, Michael Canale (1984) proposed a framework for oral proficiency
testing that has withstood the test of time. He suggested that test-takers will
perform at their best if they are led through four stages:
1 . Warm-up. In a
minute or so of preliminary small talk, the interviewer directs mutual
introductions, helps the test-taker become comfortable with the situation,
apprises the test-taker of the format, and allays anxieties. No scoring of this
phase takes place.
2. Level check .
Through a series of preplanned questions, the interviewer stimulates the
test-taker to respond using expected or predicted forms and functions. If, for
example, from previous test information, grades, or other data, the test-taker
has been judged to be a "Level 2" (see below) speaker, the
interviewer's prompts will attempt to confirm this assumpti on. The responses
may take very simple or very complex form, depending on the entry level of the learner.
Questions are usually designed to elicit grammatical categories (such as past
tense or subject-verb agreement), discourse structure (a sequence of events),
vocabulary usage, and/or sociolinguistic factors (politeness conventions, formal/informal
language).This stage could also give the interviewer a picture of the
test-taker's extroversion, readiness to speak, and confidence, all of which may
be of significant consequence in the interview's results. Linguistic target
criteria are scored in this phase. If this stage is lengthy, a tape-recording
of the interview is important.
3. Probe. Probe questions and prompts challenge
test-takers to go to the heights of their ability, to extend beyond the limits
of the interviewer's expectation through increasingly difficult questions. Probe
questions may be complex in their framing and/or complex in their cognitive and
linguistic demand. Through probe items, the interviewer discovers the ceiling
or limitation of the test-taker's profi-ciency. This need not be a separate
stage entirely, but might be a set of questions that are interspersed into the
previous stage. At the lower levels of proficiency, probe items may simply
demand a higher range of vocabulary or grammar from the test-taker than predicted.
At the higher levels, probe items will typically ask the test-taker to give an
opinion or a value judgment, to discuss his or her field of specialization, to
recount a narrative, or to respond to questions that are worded in complex
form. Responses to probe questions may be scored, or they may be ignored if the
test-taker displays an inability to handle such complexity.
4. Wind down . This final phase of the interview is
simply a short period of time during which the interviewer encourages the test-taker
to relax with some easy questions, sets the test-taker's mind at ease, and
provides information about when and where to obtain the results of the
interview. This part is not scored.
DESIGNING ASSESSMENTS: EXTENSIVE
SPEAKING
Extensive speaking tasks
involve complex, relatively lengthy stretches
of discourse. They are frequently variations on monologues, usually with
minimal verbal interaction.
Oral Presentations
In the academic and professional
arenas, it would not be uncommon to be called on to present a report, a paper,
a marketing plan, a sales idea, a design of a new product, or a method. A summary
of oral assessment techniques would therefore be incom-plete without some
consideration of extensive speaking tasks. Once again the rules for effective
assessment must be invoked: (a) specify the criterion, (b) set appr opriate
tasks, (c) elicit optimal output, and (d) establish practical, reliable scoring
procedures. And once again scoring is the key assessment challenge.
For oral presentations, a checklist
or grid is a common means of scoring or evaluation, Holistic scores arc
tempting to use for their apparent practicality; but they may obscure the
variability of performance across several subcategories, especially the two
major components of content and delivery. Following is an example of a checklist
for a prepared oral presentation at the intermediate or advanced level of
English.
Retelling a Story, News Event
In this type of task, test-takers
hear or read a story or news event that they are asked to retell. This differs
from the paraphrasing task discussed above (pages 161-162) in that it is a
longer stretch of discourse and a different genre. The objectives in assigning
such a task vary from listening comprehension of the original to production of
a number of oral discourse features (communicating sequences and relationships
of events, stress and emphasis patterns, ”expression” in the cast of a dramatic
story), fluency, and interaction with the hearer. Scoring should of course meet
the intended criteria.
Translation (of Extended Prose)
Translation of words, phrases, or
short sentences was mentioned under the category of intensive speaking. Here,
longer texts are presented for the test-taker to read in the native language
and then translate into English. Those texts could come in many forms: dialogue;
directions for assembly of a product, a synopsis of a story or play or movie,
directions on how to find something on a map, and other genres. The advantage
of translation is in the control of the content, vocabulary, and, to some
extent, the grammatical and discourse features. The disadvantage is that
translation of longer texts is a highly specialized skill for which some
individuals obtain post-baccalaureate degrees! To judge a nonspecialist's oral language
ability on such a skill may be completely invalid, espe-cially if the
test-taker has not engaged in translation at this level. Criteria for scoring
should therefore take into account not only the purpose in stimulating a translation
but the possibility of errors thatare unrelated to oral production
ability.
One consequence of our being
articulate mammals is an extraordinarily complex system of vocal communication
that has evolved over the millennia of human existence. This chapter has offered
a relatively sweeping overview of some of the ways we have learned to assess
our wonderful ability to produce sounds, words, and sentences, and to string
them together to make meaningful texts. This chapter's limited number of
assessment techniques may encourage your imagination to explore a potentially
limitless number of possibilities for assessing oral production.