You are on page 1of 21

Direct Written Corrective Feedback,

Learner Differences, and the


Acquisition of Second Language
Article Use for Generic and Specific
Plural Reference
CHARIS STEFANOU
Lancaster University
Linguistics and English Language
County South
Lancaster LA1 4 YT
United Kingdom
Email: hstefanou.ling@gmail.com

ANDREA RVSZ
University College London
UCL Institute of Education
20 Bedford Way
London WC1H 0AL
United Kingdom
Email: a.revesz@ioe.ac.uk

This article reports on a classroom-based study that investigated the effectiveness of direct written corrective feedback in relation to learner differences in grammatical sensitivity and knowledge of metalanguage. The study employed a pretestposttestdelayed posttest design with two treatment sessions.
Eighty-nine Greek English as a foreign language (EFL) learners were randomly assigned to 3 groups: direct feedback only, direct feedback plus metalinguistic comments, and comparison. The linguistic target
was article use for specific and generic plural reference. A text summary and a truth value judgment test
were employed to measure any development in learners ability to use articles. The results revealed an
advantage for receiving direct feedback over no feedback, but provided no clear evidence for the benefit
of supplying metalinguistic information. Additionally, participants with greater grammatical sensitivity
and knowledge of metalanguage proved more likely to achieve gains in the direct feedback only group.
Keywords: written corrective feedback; individual differences; article use

THE ROLE OF WRITTEN CORRECTIVE


feedback (WCF) has received considerable interest among instructed second language acquisition
(SLA) researchers for the past two decades. Interest in the topic has partly been driven by
Truscotts (1996) controversial claim that WCF,
a widely used pedagogical tool, is ineffective and
potentially harmful to second language (L2)
learners. Contrary to Truscotts supposition,
recent years have seen an accumulation of empirical evidence attesting that WCF can be useful
and effective in promoting L2 development

The Modern Language Journal, 99, 2, (2015)


DOI: 10.1111/modl.12212
0026-7902/15/263282 $1.50/0

C 2015 The Modern Language Journal

(e.g., Bitchener, 2008; Bitchener & Knoch, 2008;


Ellis et al., 2008; Ferris & Roberts, 2001; Sheen,
2007, 2010; Sheen, Wright, & Moldawa, 2009; Van
Beuningen, De Jong, & Kuiken, 2012). As a result,
researchers are now focusing on investigating
the factors actually influencing the efficacy of
WCF, among them its consistency in focusing
on particular linguistic targets (e.g., Ferris &
Roberts, 2001), the number of linguistic targets
(e.g., Ellis et al., 2008; Farrokhi & Sattarpour,
2012), the amount of metalinguistic information
accompanying the feedback (e.g., Bitchener,
2008; Bitchener & Knoch, 2008; Sheen, 2007,
2010), the availability of the correct construction
(e.g., Storch & Wigglesworth, 2010; Suh, 2010),
and learner differences in cognitive abilities, such
as inductive language learning capacity (Sheen,
2007).

264
One aim of the current study is to extend existing research by further exploring the extent to
which metalinguistic information may influence
the effectiveness of WCF. The additional aims
and novel aspects of our research include examining whether WCF can facilitate the acquisition
of a new linguistic targetarticle use for generic
and specific plural referenceand whether the
link between WCF type and SLA may be moderated by grammatical sensitivity and knowledge of
metalanguage.
BACKGROUND
Written Corrective Feedback
Written corrective feedback refers to the
information provided to L2 learners about the illformedness of their written production (Loewen,
2012). The vast majority of SLA researchers (e.g.,
Ferris, 1999, 2002) agree that there are a number
of potential benefits to supplying WCF in L2
classrooms. For example, in line with the widely
accepted view that acquisition requires some
focus on form, WCF can function as such a device
and draw learners attention to L2 constructions
(e.g., Ellis, 2005), thereby helping them to notice
gaps in their current L2 knowledge (Schmidt,
1990). WCF may additionally engage learners in
guided learning and problem solving and, as a
result, promote the type of reflection that is more
likely to foster long-term acquisition (Bitchener
& Knoch, 2008, p. 415).
Contrary to these arguments, a few researchers
have raised concerns regarding the use of WCF on
L2 writing, Truscotts (1996, 2007) counterclaims
being the most influential among them. Truscott
objected to the use of WCF on the ground that the
way it is practiced in the majority of language classrooms disregards well-established understandings
from SLA research, including that (a) L2 development is a gradual and intricate process, which entails more than just the sudden discovery of rules
and simple knowledge transfer from teachers to
students, (b) there is little probability that a single form of feedback will promote the acquisition
of features from various linguistic domains such as
lexis, morphology, and syntax, (c) WCF is likely to
have no value for promoting implicit knowledge
and only has the capacity to assist in developing a
limited degree of explicit knowledge, which may
be helpful for revision purposes but not for genuine L2 improvement, and (d) teachers are not
equipped to provide feedback that is adjusted to
the developmental needs of their learners given
the absence of well-documented developmental

The Modern Language Journal 99 (2015)


patterns in SLA. Truscott did not only question
the efficacy of feedback, but went as far as to suggest that the provision of corrective feedback may
be counterproductive and constitute a source of
anxiety and stress for students, who may ultimately
avoid using complex features in fear of receiving
feedback.
In response to calls from both proponents and
opponents of WCF for improved research designs,
by now a large number of tightly controlled studies have accrued, examining mainly two dimensions: direct/indirect and focused/unfocused
feedback. The distinction between direct and indirect feedback concerns whether the correct
construction is supplied or not. Direct feedback
entails the correct construction, whereas indirect
WCF only indicates the presence of an error, leaving the responsibility of correction to learners.
Focused feedback consistently targets a single or
a limited number of problematic linguistic features, while unfocused or comprehensive feedback addresses all errors in the learners writing irrespective of their nature. Previous research
comparing the effectiveness of feedback along
these two dimensions has yielded mixed findings
(see Van Beuningen et al., 2012, for a review).
Nonetheless, focused direct feedback, the object
of this investigation, has consistently been shown
to enhance grammatical accuracy (e.g., Bitchener, 2008; Bitchener & Knoch, 2008, 2010; Farrokhi & Sattarpour, 2012; Sheen et al., 2009). Relatively little is known, however, about the effects
of different kinds of focused direct feedback on
SLA.
The handful of studies that have explored the
learning potential of different types of focused
direct feedback have primarily been concerned
with determining the utility of enhancing direct
feedback with metalinguistic information. Sheen
(2007) looked into whether direct written corrections with or without metalinguistic comments
have a greater capacity to promote the acquisition
of article use for first mention and anaphoric
reference. She found that the participantsadult
intermediate ESL learners from various first language (L1) backgroundsbenefited to a greater
extent from direct feedback with the addition of
metalinguistic information. Using the same linguistic target, Bitchener and Knoch (2008, 2009;
Bitchener, 2008) examined, in a series of studies,
the extent to which learners accuracy can be fostered by four types of WCF conditions: (a) direct
written corrections with written and oral metalinguistic comments, (b) direct written corrections
with written metalinguistic comments, (c) direct
written corrections only, and (d) no feedback.

Charis Stefanou and Andrea Rvsz


All three studies converged on the same finding
that the provision of direct feedback was more
beneficial than receiving no feedback, regardless
of whether the feedback occurred on its own
or was complemented with metalinguistic comments. More recently, Shintani and Ellis (2013)
also compared the potential benefits afforded
by direct corrective feedback and metalinguistic
comments. Unlike in previous research, however,
metalinguistic comments did not complement
corrections, but were provided on their own.
The researchers found that, while metalinguistic
comments facilitated the acquisition of explicit
knowledge of the target feature (English indefinite article), direct feedback did not lead to gains.
Drawing on the participants stimulated recall
comments, Shintani and Ellis concluded that
metalinguistic information may have enabled
learners to develop awareness of the target rule,
which they were able to apply in the revision
process. Clearly, the results to date are mixed
regarding the impact of supplementing direct focused feedback with metalinguistic information.
One goal of the present study is to help clarify
the value of providing metalinguistic comments.
Additionally, we intend to begin investigating
whether existing findings on direct focused feedback may be extended to other linguistic constructions. As discussed earlier, several studies
have focused on the same linguistic rule: article
use for first mention and anaphoric reference.
Here, we examined the extent to which various
types of direct WCF may facilitate the acquisition
of another aspect of English article use: generic
and specific plural reference. Additionally, a key
contribution of our research lies in exploring the
moderating effects of two learner factors on WCF,
addressing recent calls to explore how learners
with differential cognitive abilities may benefit
from different kinds of WCF (Bitchener & Ferris,
2012; Kormos, 2012).
Learner Differences
In the present study, learner differences or learner
factors are used as cover terms to encompass differential capacities among learners in grammatical sensitivity and knowledge of metalanguage.
Grammatical sensitivity is a cognitive ability that
has traditionally been conceptualized as a component of language learning aptitude (e.g., Carroll, 1981; Skehan, 2002). It refers to the ability to recognize the different syntactic patterns
and grammatical functions of words in a given
sentence structure, irrespective of knowledge of
grammatical terminology. The choice of gram-

265
matical sensitivity as a potential mediating variable was driven by the assumption that this capacity would be particularly relevant to learners ability to benefit from WCF targeting a grammatical
phenomenon.
Knowledge of metalanguage refers to the
ability to use subject-specific terminology to articulate metalinguistic rules. Tests of metalanguage
typically ask participants to identify examples of
grammatical terms in L1 and/or L2 sentences
(Alderson, Clapham, & Steel, 1997; Berry, 2009;
Elder, 2009) and/or to give stand-alone examples
of grammatical terms (Berry, 2009). Knowledge of
metalanguage, which was operationalized here as
learners knowledge of the appropriate terminology to describe structures and forms in their L2,
was presumed to be especially important to the
capability to learn from metalinguistic comments.
We expected that familiarity with metalanguage
would assist learners in understanding and
making use of the metalinguistic explanations
offered, thus leading to larger instructional gains.
Although the importance of investigating
aptitudetreatment interactions has increasingly
been emphasized (e.g., DeKeyser, 2012; Robinson, 2002, 2005; Vatz et al., 2013) in the L2
literature, empirical research exploring potential
links between learners cognitive abilities and
propensity to benefit from particular instructional treatments remains relatively limited (see
Vatz et al., 2013, for a recent summary). To
date, only the previously mentioned study by
Sheen (2007) looked into the relationship between WCF and learner differences in cognitive
abilities. In particular, she examined whether
inductive language learning ability may moderate
the efficacy of direct written feedback. The results
revealed greater benefits for learners with high
inductive language learning ability under both
feedback conditions investigated, direct feedback
only and direct feedback with metalinguistic
comments. Sheen additionally found that high
language analytic ability posed a greater advantage when the written feedback was accompanied
by metalinguistic information.
To the best of our knowledge, grammatical sensitivity and knowledge of metalanguage have not
yet been studied in the context of WCF research.
Nevertheless, there is some evidence indicating
that learners who differ along these factors may
respond differently to particular instructional
techniques. Among other individual difference
factors, Erlam (2005) examined the moderating
effects of grammatical sensitivity on the effectiveness of three types of instruction: inductive,
structured input, and deductive. She found that,

266

The Modern Language Journal 99 (2015)

in L2 French classrooms, students with higher


grammatical sensitivity benefited more from inductive and structured input training than deductive instruction. Similar findings emerged from
a recent study of oral corrective feedback by Li
(2013), who observed that grammatical sensitivity
was positively linked to learning from implicit
feedback (operationalized as recasts) but not to
learning from explicit feedback. Trofimovich,
Ammar, and Gatbonton (2007) also reported a
positive correlation between learners grammatical sensitivity and ability to benefit from recasts.
Sachs (2010), however, in exploring the effects
of computer-mediated feedback on L2 development, found that grammatical sensitivity predicted gains under the no feedback and more explicit feedback condition (right/wrong feedback
plus metalinguistic information in the form of
tree diagrams) in her research, whereas grammatical sensitivity was not significantly related to gains
in the more implicit condition (right/wrong feedback). For the no feedback group, metalinguistic
knowledge also emerged as a predictor of learning. Based on these studies, no clear patterns can
be discerned regarding the moderating effects
of grammatical sensitivity and metalinguistic
knowledge on L2 instruction. More research is
needed to clarify the role of these factors under
various instructional conditions, and the present
study takes up this research direction.
METHOD
Research Questions
In light of the research needs outlined earlier,
the following research questions were formed:
RQ1.

RQ2.

a. Does direct written corrective feedback


help improve Greek EFL learners use of
articles for specific and generic plural reference?
b. If yes, which type is more beneficial: direct written feedback only or direct written
feedback with metalinguistic information?
a. Are there any relationships between
learning benefits from direct written corrective feedback and learner differences
in grammatical sensitivity, or knowledge of
metalanguage?
b. If yes, does the strength of the relationships differ by type of feedback, direct written feedback only or direct written feedback with metalinguistic information?

Design
The study followed a pretestposttestdelayed
posttest design, with two treatment sessions
between the pretest and posttest. The participants

(N = 89) were randomly assigned to one of three


groups: direct feedback only (n = 30; henceforth,
direct feedback only), direct feedback plus metalinguistic comments (n = 30; henceforth, direct
metalinguistic), or the comparison group (n =
29). The participants ability to use articles for specific and generic plural reference was measured in
two assessment tasksa text summary and a truth
value judgment (TVJ) test. During the two treatment sessions, the two direct feedback groups carried out additional versions of the text summary
task and received feedback in response to article errors according to their group assignment.
The comparison group completed the same tasks,
but received feedback only on spelling errors. A
words-in-sentences test and a test of metalanguage
were also administered to all participants, in order to measure their grammatical sensitivity and
knowledge of metalanguage respectively.
Participants
The 89 participants were EFL students who
were in their first year in the same public high
school in Cyprus. All of them were native speakers
of Greek and had been studying English for 6 to
7 years. Fifty-one were female and 38 were male.
They were all 16 years of age. Using the Oxford
Placement Test (Dave, 2004), intermediate-level
participants were selected for the study.1 The rationale for including participants with intermediate proficiency was that they were likely to have
at least some knowledge of the English article system, but were unlikely to have already mastered
the rules associated with generic article use. Additionally, using participants with intermediate proficiency was thought to increase the comparability
of our research to previous WCF studies, most of
which also involved participants of intermediate
level proficiency in English.
Linguistic Target
In the present study, the written feedback targeted article use with specific and generic plural referents. Languages with articles vary as to
whether they allow both definite and bare plural
referents, and the two languages involved in this
study, English and Greek, exemplify this distinction. English, the target language, permits both
definite and bare plural referents and assigns a
different meaning to each. As illustrated in the
following examples, definite plural noun phrases
(1a) carry a specific meaning like demonstrative
plurals (1c); that is, they describe referents that
are known to both the speaker and the hearer. By

Charis Stefanou and Andrea Rvsz


contrast, the only reading available for bare plurals is that of generic reference (1b).
(1) Plural Reference in English
a. Definite plural
The parrots are colorful. [specific reference: parrots are known to both interlocutors]
b. Bare plural
Parrots are colorful. [generic reference: all parrots in general]
c. Demonstrative plural
These parrots are colorful. [specific reference: parrots are known to both interlocutors]
In Greek, the participants L1, plural referents
can only be definite, and both generic and specific readings can be assigned to them. In other
words, a definite plural noun phrase can describe
either some specific referents or a species in general, as exemplified in (2):
(2) Definite Plural Reference in Greek

O
the (pl.) parrots

.

are
colorful

(the) parrots are colorful. [specific reference: parrots known to both interlocutors OR generic reference: all parrots in
general]
In sum, English employs articles to map two
different meanings (generic and specific reference) onto two different forms (bare and definite
plurals), whereas Greek marks both meanings
onto a single form (definite plural). Thus, the
interpretation of Greek definite plurals as either
specific or generic is dependent on the context.
This type of cross-linguistic difference has been
demonstrated to pose a challenge for Spanish
learners of English, whose L1, in line with most
Romance languages, behaves like Greek with
regard to plural referents (see Ionin & Montrul,
2010; Snape, Garca Mayo, & Grel, 2009). Importantly, Greek EFL learners themselves have
been found to show difficulty in using plural
generics (Stefanou, 2010). Therefore, it appears
worthwhile to investigate the extent to which
written corrective feedback techniques may help
learners master this construction.
Treatment
The two treatment sessions took place during
the participants normally scheduled classes, in
which one of the researchers acted as the teacher.
First, the participants completed two versions

267
of the text summary task. As part of the text
summary task, they were required first to read a
short text in Greek. Half of the text introduced
an animal species using generic reference, and
the other half described a specific pair of animals
of the same species. Next, without consulting
the text, the participants were asked to provide
short descriptions of eight pictures in English,
each of which corresponded to part of the information presented earlier in the text. Four
of the pictures targeted generic reference (e.g.,
Bears sleep in the winter; henceforth, generic text
summary), and the other four were designed to
elicit specific referents (e.g., The bears in my town
came from Northern Europe; henceforth, specific
text summary). Thus, the two treatment sessions,
overall, intended to elicit the use of 16 generic
and 16 specific referents (both treatment sessions included two treatment tasks, with each
treatment task designed to elicit four generic and
four specific uses). Three different versions of
the text summary task were also used as part of
the assessment. The task had no time limit, and
participants could seek assistance with unfamiliar
vocabulary. Except for the L1 reading component, this task format was aligned well with the
activities that the students would normally carry
out during their English classes.
Once all students had finished the task, their
task sheets were collected for marking. In the
two experimental groups, all article errors with
generic and specific plural referents were corrected by one of the researchers using direct
WCF. Other error types, including different
errors in article use, were ignored. For the direct
feedback only group, the feedback took the
form of insertions of the definite article when
the context required specific instead of generic
plural reference, or deletions of the definite
article when the use of bare generics rather
than definite specific plurals would have been
appropriate. In the direct metalinguistic group,
the direct corrections were complemented with
relevant metalinguistic information, which was
handwritten at the top of each task sheet in
English. This information made reference to the
actual content of the text summary task (lions in
the example that follows), and read as:
Use the + plural noun (e.g., The lions . . .)
to describe some particular animals.
=
Use + plural noun (e.g., Lions . . .)
to describe all animals in general.
In the comparison group, article errors with specific and generic referents were ignored and

268
corrections of spelling errors were provided
instead.
In the next session, the students received back
their text summaries with the WCF and were given
five minutes to look over their errors and the respective corrections, with the advice to attend to
the feedback carefully because, as they were told,
they would later have to complete a similar task.
Piloting suggested that 5 minutes allowed sufficient time for learners to examine the feedback.
No further comments were provided by the researcher, and the students were not asked to revise
their writing, as in Bitchener & Knoch (2009), Ellis et al. (2008), and Sheen (2010). This methodological choice was in line with normal feedback
practice in this context, since the students are
usually not asked to revise their work based on
teacher feedback. It was also deemed more appropriate given the availability of the correct construction in the feedback provided, which would
have made revisions resemble passive copying on
the part of the students. As Polio (2012) notes,
it is obvious that a writer can look at direct corrections and copy them onto a new piece of writing (p. 377); what is key to the success of WCF
is drawing learner attention to the target of the
feedback provided. Referring to Sachs and Polio
(2007), she suggests this might be achieved by asking students to take time to look over their corrections before revising. The revision component,
however, does not seem necessary to trigger noticing, as evidenced in some existing studies of WCF
(Ellis et al., 2008; Sheen, 2010) and the findings
obtained here.

Assessment Tasks
Two testing tasks were designed to assess article use with specific and generic plural referents:
a text summary and a truth value judgment test.
Both tasks entailed two types of itemsitems targeting article use for specific reference and items
designed to elicit article use for generic reference. Our rationale for using two different types
of assessment tasks was to assess any effects of the
treatment on the participants productive and receptive knowledge of the targeted constructions.
Three parallel versions of the testing tasks were
developed and counterbalanced across the three
testing sessions in a split-block design. Version
A of the assessment tasks can be found in the
Appendix.
Text Summary Test. This test had exactly the
same format as the treatment task; it aimed at
assessing the participants productive knowledge

The Modern Language Journal 99 (2015)


of generic and specific article use. In line with
its specifications, it succeeded in eliciting the use
of plural referents 88.3% of the time in both the
generic and specific sections of the assessment.
To assess the internal consistency reliability of the
three versions of the test, Cronbachs alpha was
calculated by aggregating the scores of the three
versions of the text summary across the three
testing sessions. This was found to be high for all
three versions of the generic ( > .77) as well as
the specific ( > .88) parts of the assessment. The
total score was four points for both the generic
and specific components of the test. Participants
received one point for each correct response.
We decided against utilising obligatory occasion
analysis given that the use of the target construction was found to be essential in describing each
picture prompt based on native speaker baseline
data.
Truth Value Judgment Test. The aim of the truth
value judgment test was to probe the students receptive knowledge of plural generic and specific
noun phrases; it was modeled on the instrument
used in Ionin and Montruls (2009) study investigating article use by Spanish learners of English.
In keeping with Ionin and Montruls task specifications, each item involved a short story of about
2040 words, which juxtaposed a specific with a
generic reading. Based on the story, participants
were asked to judge the truth value of a subsequent statement. Each version of the test included
18 items: 12 target and 6 distractor items. The 12
target items were constructed using four stories,
each repeated three times: once with a definite
plural in the subsequent statement, once with a
bare plural, and a third time with a demonstrative
plural. The item with the demonstrative plural
statement served as the control, since anaphoric
reference is the only possible reading for both English and Greek demonstrative plurals. In two stories, the target statement was true with a definite
plural, true with a bare plural, and false with a
demonstrative plural. For the other two stories,
the values were reversed. An example of a story
and a corresponding statement with a definite
plural (the libraries) is the following:

3. Example of a Truth Value Judgment Item


Most libraries are full of millions of printed books.
But there are three strange libraries. They dont
have printed books, they only have millions of electronic books.
The libraries
TRUE FALSE

have

electronic

books.

269

Charis Stefanou and Andrea Rvsz


The six distractor items also consisted of a 20
40 word story, followed by a truefalse statement.
The stories juxtaposed interpretations of passive
and active voice. The six items altogether included three stories, each appearing twice: once
with a true target statement and a second time
with a false one.
As indicated by Web VocabProfiler v3 (Cobb,
n.d.), the vast majority of the vocabulary items in
the test were among the first thousand most frequent words in the English language, and a few
were among the second thousand most frequently
used words. In addition, students were allowed to
ask for the meaning of any unknown words in order to ensure that lack of vocabulary knowledge
does not interfere with test performance. The
internal consistency reliability coefficients, computed separately for the items included in the different versions of the assessment, were at acceptable or good levels for both the generic and specific TVJ test versions ( > .70 and > .78). For
the generic as well as the specific items, the maximum total score was 4 points.
Measures of Learner Differences
Participants were administered two tests, a
words-in-sentences test and a test of metalanguage, in order to obtain information about their
grammatical sensitivity, and knowledge of metalanguage respectively.
Words in Sentences Test. The words-in-sentences
test was an adapted version of the corresponding
component of the MLAT and aimed to assess the
participants grammatical sensitivity or ability to
understand the functions of words in sentences.
Fifteen Greek sentences were utilized as stimuli.
Each key sentence included an underlined word,
and was followed by a second sentence in which
five words were underlined. The participants had
to choose which of the five words in the second
sentence filled the same grammatical role as the
underlined word in the key sentence. The internal consistency reliability for the test was good
( = .774).
Test of Metalanguage. The test of metalanguage
was an adaptation of an instrument originally developed for this purpose by Bloor (1986) and
more recently used by Alderson et al. (1997). This
test was deemed appropriate for use in the present
study, since receptive knowledge of metalanguage
was hypothesized to be relevant in instructional
conditions that prompted learners to interpret
metalinguistic feedback. The participants were
presented with a Greek sentence and asked to

identify words and phrases that corresponded to


a list of 10 grammatical terms (e.g., adjective, preposition), some of which were potentially relevant
to describing rules about the target construction
(e.g., definite article, noun). The same sentence
used by Bloor and Alderson et al., translated into
Greek, was used as the source for the examples:
Materials are delivered to the factory by a supplier, who
usually has no technical knowledge, but who happens
to have the right contacts. The internal consistency
reliability for the test was good ( = .845).

Procedure
As Figure 1 illustrates, the learners were initially screened using the grammar part of the
Oxford Placement Test (Dave, 2004). On the
second day of the study, the pretest was administered, followed by the first treatment task. A
few days later within the same week, the participants received WCF in response to errors they
made on the first treatment task. After studying
the feedback for 5 minutes, they completed the
second treatment task. The second week of the
study started with the participants looking over
the WCF, which addressed errors in their performance on the second treatment task. They were
again given 5 minutes to process the feedback,
then the immediate posttest followed just like
in previous research by Bitchener and Knoch
(2009). In the third week, the two measures of
learner factors were administered. Finally, in the
fourth week, the delayed posttest was completed,
2 weeks after the second treatment session. The
pretest, posttest, and delayed posttest lasted approximately 40 minutes, and the participants on
average took about 20 minutes to carry out each
treatment task. The words-in-sentences and metalanguage tests were completed within the time
limit of a normally scheduled 45-minute class.
During the period of the study, the teachers of the
participating classes were asked not to provide
any input on article use in their English lessons in
an attempt to control for exposure to the target
construction outside of the experiment.

Data Analyses
Scoring. Both the assessment tasks and the
measures of learner differences were marked
dichotomously. One point was awarded for each
correct answer, and zero points were given for
incorrect responses. For the two assessment tasks,
separate scores were calculated for the generic
and the specific reference items.

270

The Modern Language Journal 99 (2015)

FIGURE 1
Study Design

Statistical Analyses. As a first step, descriptive


statistics were calculated for each groups performance on the two assessment tasks of the pretest,
posttest, and delayed posttest and the two learner
difference measures. In order to address the
first research question and examine the effects
of direct WCF on learner gains in article use, a
series of ANOVAs was conducted. First, one-way
ANOVAs were run on the pretest scores for each
assessment task in order to detect any initial
group differences. Next, the data were submitted
to a series of mixed-model ANOVAs. Since the
results of Mauchlys test for sphericity were statistical in the analyses, the GreenhouseGeisser
correction was applied to the degrees of freedom.
Where appropriate, post hoc independent samples t-tests were performed. To measure effect
sizes, we computed partial eta-squared (p 2 ) for
the ANOVAs and Cohens d for the t-tests. Following Cohen (1988), p 2 values of .01, .06, and .14
and d values of .20, .50, and .80 were considered
small, medium, and large. The second research
question was addressed by computing Spearman
two-tailed bivariate correlations. First, correlations were calculated between the participants

pretest scores and the learner difference factors


in order to detect any relationships between
the variables at the time of the pretest. Next,
we correlated the learners pretestposttest and
pretestdelayed posttest gain scores in the assessment tasks with their scores in the measures
of learner differences. Following Cohen (1992),
2 values of .01, .09, and .25 were interpreted
as small, medium, and large effect sizes. For all
analyses, the alpha level for significance was set
at .01 to adjust for multiple comparisons. All
statistical analyses were conducted using the IBM
Statistical Package for Social Sciences (SPSS) 19.
RESULTS
RQ1: Effects of Direct Written Corrective Feedback on
Article Use
Table 1 presents the descriptive statistics for
the learners scores on the two assessment tasks
of the pretest, posttest, and delayed posttest. As
Table 1 and Figures 25 show, the comparison
group improved slightly from the pretest to the
posttest in generic article use on both tests but

271
3.33 (1.18)
2.90 (1.47)
3.10 (1.47)
2.30 (1.68)
1.13 (1.55)
1.13 (1.63)
3.13 (1.43)
1.87 (1.55)
2.93 (1.41)
1.67 (1.49)
.97 (1.07)
.73 (.94)
Note. The total scores were 4 points for each test.

1.17 (1.60)
1.31 (1.51)
1.72 (1.51)
1.27 (1.36)

1.48 (1.64)
.76 (1.02)

3.83 (.59)
3.57 (.94)
3.67 (.92)
3.03 (1.38)
2.90 (1.27)
3.00 (1.23)
3.60 (.89)
3.47 (.97)
3.50 (1.00)
3.40 (1.10)
3.40 (.89)
3.17 (.98)
3.14 (1.30)
2.72 (1.36)
3.38 (1.11)
3.38 (1.05)
2.69 (1.34)
2.93 (1.43)

Generic
Text summary
TVJ
Specific
Text summary
TVJ

Del M (SD)
Post M (SD)
Pre M (SD)
Post M (SD)
Pre M (SD)

Post M (SD)

Del M (SD)

Pre M (SD)

Direct Only (N = 30)


Comparison (N = 29)

TABLE 1
Pretest, Posttest, and Delayed Posttest Scores on the Testing Tasks Across the Groups

Del M (SD)

Direct Metalinguistic (N = 30)

Charis Stefanou and Andrea Rvsz

declined in the accurate article use with plural


specific referents. The same trends were observed
for the pretestdelayed posttest gain scores of the
comparison group, except for a slight decrease in
the generic TVJ scores. As compared to the comparison group, both feedback groups exhibited
considerable pretestposttest improvement in the
generic as well as the specific sections of both assessment tasks, and maintained their improved
scores on the delayed posttest. The only exception to this pattern was the direct metalinguistic
groups posttest performance on the generic part
of the TVJ test, where the participants retained,
rather than increased, their pretest scores.
Spearman correlational analyses, run separately for the participants pretest, posttest, and
delayed posttest scores, found four significant correlations between the participants performance
on the assessment tasks: between (a) the generic
and specific parts of the text summary pretest, (b)
the generic and specific components of the TVJ
pretest, (c) the specific part of the text summary
posttest and generic part of the TVJ posttest, and
(d) the specific parts of the text summary and
TVJ delayed posttest. The strength of these correlations were all in the small to medium range
(.09 < 2 < .16, p < .01). The remaining 14 correlations did not yield significant links. Overall,
these results suggest that the tests did not tap exactly the same constructs.
Turning to the inferential statistics conducted
to test group differences, one-way ANOVAs found
no significant difference in the pretest scores of
the three groups (comparison, direct only, direct
metalinguistic) on any of the assessments, generic
text summary: F(2,86) = 2.83, p = .07, p 2 = .06,
specific text summary: F(2,86) = 2.41, p = .10,
p 2 = .05; generic TVJ: F(2,86) = .29, p = .75,
p 2 < .01; specific TVJ: F(2,86) = 1.30, p = .28,
p 2 = .03. Having established that no significant
differences existed between the groups at the
pretest, an overall mixed-model ANOVA was run,
with time as the within-subjects factor and group
as the between-subjects variable, for each assessment task. Except for the generic part of the text
summary test, F(3.43,86) = 2.26, p = .08, p 2 = .05,
the interaction between time and group emerged
as statistically significant. The effect size was large
for the specific text summary test: F(3.64,86) =
12.13, p < .01, p 2 = .22, and specific TVJ test,
F(3.49,86) = 6.30, p < .01, p 2 = .13, but medium
for the generic TVJ test, F(3.39,86) = 3.09, p = .02,
p 2 = .07.
Post hoc paired comparisons revealed significant differences, with medium to large effect sizes,
between the comparison and direct feedback only

272

The Modern Language Journal 99 (2015)

FIGURE 2
Generic Text Summary: Performance Across Groups

group and between the comparison and metalinguistic group for the specific component of
both the text summary test, direct only, and
comparison: F(1.59,57) = 21.34, p < .01, p 2 =
.27; metalinguistic and comparison: F(1.91,57) =
15.54, p < .01, p 2 = .21, and TVJT, direct only
and comparison: F(1.84,57) = 8.66, p < .01, p 2 =
.13; metalinguistic and comparison: F(1.74,57) =
10.64, p < .01, p 2 = .16. However, no significant
differences were detected in the performance
of the two feedback groups, text summary test:
F(1.78,58) < .01, p = .99, p 2 < .01, TVJ test:
FIGURE 3
Specific Text Summary: Performance Across Groups

F(1.67,58) = 1.84, p = .43, p 2 = .01. As shown in


Table 2, independent samples t-tests confirmed
that the direct only and metalinguistic groups
achieved significantly higher pretestposttest
gains than the comparison group on the specific
part of the text summary test, and displayed
greater pretestdelayed posttest gains on the specific part of both the text summary and TVJ tests.
The effect sizes were in the large range. For the
generic TVJ test, post hoc mixed-model ANOVAs
found a significant difference between the performance of the comparison and metalinguistic

273

Charis Stefanou and Andrea Rvsz


FIGURE 4
Generic TVJ: Performance Across Groups

group, F(1.71,57) = 5.36, p < .01, p 2 = .09.


However, an independent samples t-test revealed
that this was due to a posttestdelayed posttest
difference, t(57) = 4.20, p < .01, d = 1.07,
rather than differences in pretestposttest,
t(57) = 1.04, p = .30, d = .27, or pretestdelayed
posttest gains, t(57) = 1.90, p = .06, d = .49.
In sum, the two direct feedback groups demonstrated significantly greater pretestposttest and
pretestdelayed posttest development than the
comparison group in article use for specific plural
reference on both the text summary and TVJ tests,
FIGURE 5
Specific TVJ: Performance Across Groups

but no significant differences were found between


these groups on the generic component of the
tests. Nor was any difference detected between the
gains of the direct only and metalinguistic groups
on any of the assessments.
RQ2: Moderating Effects of Learner Factors on the
Effectiveness of Direct WCF
Table 3 displays the descriptive statistics of the
learners performance on the words-in-sentences
test and the test of metalanguage for the two

274

The Modern Language Journal 99 (2015)

TABLE 2
Results of t-tests Comparing the Comparison Group With the Direct Only and Direct Metalinguistic Groups
in Specific Reference Contexts
Groups/Testing Task
Comparisondirect only
Specific text summary
Specific TVJ
Comparisondirect metalinguistic
Specific text summary
Specific TVJ

Gain Score

t(57)

Pretestposttest
Pretestdel. posttest
Pretestposttest
Pretestdel. posttest

5.44
4.73
2.12
3.87

<.01*
<.01*
.04
<.01*

1.41
1.34
.55
1.00

Pretestposttest
Pretestdel. posttest
Pretestposttest
Pretestdel. posttest

4.69
4.48
2.00
4.50

<.01*
<.01*
.05
<.01*

1.17
1.20
.52
1.17

Note. *p < .01.

feedback groups. Independent samples t-tests revealed no significant group difference in grammatical sensitivity, t(57) = 1.54, p = .13, d = .40,
and knowledge of metalanguage, t(57) = 1.32,
p = .19, d = .34. Spearman correlational analyses found medium size relationships between the
two learner difference measures ( 2 = .12, n = 60,
p < .01).
To assess whether the impact of direct WCF on
article use, with or without metalinguistic comments, was related to learner differences in grammatical sensitivity and knowledge of metalanguage (RQ2), a series of Spearman correlations
were calculated. As shown in Table 4, there were
no significant correlations between the pretest
scores and the learner factors for any of the four
testing tasks in the two experimental groups, indicating no initial significant relationships between
the learner factors and the pretest assessments.
Table 5 shows the results of the correlations
computed between the measures of learner differences and the pretestposttest and pretest
delayed posttest gain scores of the two experimental groups on the two assessment tasks. Four significant correlations were identified, all of which
concerned the gain scores of the direct feed-

back only group in the specific text summary


test. Both learners pretestposttest and pretest
delayed posttest gain scores were found to have
positive, strong links to grammatical sensitivity, as
assessed by the words-in-sentences test ( 2 = .36,
2 = .36 respectively), and the test of metalanguage ( 2 = .25, 2 = .24 respectively).
DISCUSSION
The first research question asked whether
the provision of direct WCF can improve Greek
EFL learners article use for generic and specific
plural reference. Our statistical analyses revealed
that the two feedback groups achieved superior
performance over the comparison group in the
specific reference component of both the text
summary and the TVJ tests, but they showed no
significantly greater gains on the generic parts
of the assessments. The differences between
the experimental groups and the comparison
in the specific sections provide evidence that
direct WCF can assist in increasing the accuracy
with which Greek EFL learners supply articles
to mark specific plural reference. The present
study, therefore, corroborates the findings of

TABLE 3
Performance on Learner Difference Measures Across the Experimental Groups
Direct Only
(N = 30)
Test
Grammatical sensitivity
Metalanguage

Direct Metalinguistic
(N = 30)

SD

SD

11.90
8.50

3.19
2.16

10.67
7.63

3.01
2.88

Note. The total scores were 15 and 10 points for the tests of grammatical sensitivity and metalanguage, respectively.

275

Charis Stefanou and Andrea Rvsz


TABLE 4
Correlations Between Measures of Learner Differences and Pretest Scores of the Direct Only and Direct
Metalinguistic Groups
Metalanguage

Testing Task
Generic
Text sum.
TVJ
Specific
Text sum.
TVJ

Grammatical Sensitivity

Direct Only

Direct Meta

Direct Only

Direct Meta

.44
.24

.02
.20

.15
.13

.44
.49

.04
.12

.83
.52

.19
.04

.31
.83

.21
.05

.26
.80

.22
.16

.24
.41

.13
.08

.48
.69

<.01
.31

.97
.10

Note. Direct meta = direct metalinguistic.

previous effects-of-instruction research indicating


that focused direct WCF can lead to improved
post-treatment performance (e.g., Bitchener,
2008; Bitchener & Knoch, 2008; Ellis et al., 2008;
Ferris & Roberts, 2001; Sheen, 2007, 2010; Sheen
et al., 2009; Van Beuningen et al., 2012), contrary
to Truscotts (1996) claims.
Two issues are worth discussing in relation to
the results obtained for the overall effectiveness
of direct WCF in the present study. First, a logical question is why the participants demonstrated
greater gains in the specific, as compared to the
generic, parts of the testing tasks. A possible explanation lies in the fact that the generic components of the assessments yielded relatively high
scores for all three groups on the pretest, resulting

in a ceiling effect. In other words, given that the


participants scores in generic article use were already reasonably high on the pretest, WCF could
not lead to substantial gains due to the treatment.
It is worth noting, however, that the learners correct use of the zero article for generic reference
may not have been governed by the correct underlying rule on the pretest. It is possible that the
seemingly correct suppliance of bare plurals in
generic contexts was a reflection of the oft-cited
tendency among L2 learners to omit the definite
article (e.g., Garca Mayo, 2008; Liu & Gleason,
2002; Snape et al., 2009; Trenkic, 2007). It would
be interesting to test the validity of this hypothesis
in a follow-up study by tapping the reasoning that
underlies learners article choices with the help

TABLE 5
Correlations Between Measures of Learner Differences and Gain Scores of the Direct Only and Direct
Metalinguistic Groups
Metalanguage
Direct Only
Testing Task
Generic
Text sum.
TVJ
Specific
Text sum.
TVJ

Gain Score

Prepost
Predel. post
Prepost
Predel. post

.08
.01
.08
.10

Prepost
Predel. post
Prepost
Predel. post

.50
.49
<.01
.02

Grammatical Sensitivity

Direct Meta

Direct Only

Direct Meta

.67
.95
.68
.60

.20
.18
.21
.18

.29
.33
.26
.34

.20
.16
.43
.34

.30
.40
.02
.06

.16
.08
.17
.04

.39
.67
.38
.82

<.01*
<.01*
.99
.93

.25
.18
.37
.07

.19
.35
.04
.73

.60
.60
.04
.15

<.01*
<.01*
.82
.42

.34
.32
.11
.11

.06
.09
.55
.58

Note. Direct Meta = direct metalinguistic, *p < .01, prepost = pretestposttest, predel. post = pretestdelayed
posttest.

276
of introspective methods (e.g., stimulated recall
protocols).
A second issue that deserves attention regarding the results for the overall effectiveness of WCF
is the fact that the differences in gains between
the experimental and comparison groups were of
a much larger effect size on the text summary
test as compared to the TVJ test. The principle of
Transfer Appropriate Processing (TAP) may offer
an explanation for this finding. The fundamental
tenet of TAP is that we can better transfer and remember what we have learned if the cognitive processes that are active during learning are similar to
those that are active during retrieval (Lightbown,
2008, p. 27). One implication of TAP is that when
there is a match between the learning and testing conditions in an effects-of-instruction experiment, participants will be better able to retrieve
what they have learned during the instructional
treatment and use it in the assessment. Applying
this principle to the present study, the participants
may have demonstrated higher gains on the text
summary test because, unlike the TVJ test, the text
summary required them to use articles under conditions that were very similar to those they had
previously encountered during the treatment.
Having established the positive effects of direct
WCF on L2 article use, we examined the extent
to which the learners development differed
depending on whether they received direct feedback only or direct feedback supplemented with
metalinguistic information. The statistical analyses, conducted to compare the pretestposttest
and pretestdelayed posttest gains of the two
experimental groups on the two assessment tasks,
yielded no significant difference. Our results then
largely reflect those documented by Bitchener
(2008) and Bitchener and Knoch (2008), who
found no benefits for complementing direct
feedback with metalinguistic comments, and
run contrary to Sheens (2007) findings, who
detected superior gains in article use on all of
her three assessment tasks when metalinguistic
information was also available to learners.
A possible explanation for the conflicting findings might lie in the nature of the treatment tasks
employed. In Bitchener and Knochs research,
the participants completed picture description
tasks in a way that is similar to the present study
where participants were asked to provide short
descriptions of pictures based on a descriptive
text they had previously heard. The descriptive
task in the current study elicited a list of sentences
rather than a cohesive text, and it is likely that
the written output produced by Bitchener and
Knochs participants was similar in nature. This

The Modern Language Journal 99 (2015)


relative simplicity of the learner output produced
in terms of discourse features might have made
the target of the direct WCF more salient and
transparent in these studies, thus participants
might have experienced relatively little difficulty
in deducing the relevant article rules on their
own without the added help of metalinguistic
information. Sheen, in contrast, requested her
participants to reproduce narratives as part of the
treatment. This task probably led to the creation
of more cohesive and complex texts, which might
have imposed more demands on the learners
when processing the feedback, thereby making it
more difficult for them to discern the target rule
in the absence of metalinguistic comments.
The second research question asked about the
extent to which the effectiveness of direct WCF on
article errors is moderated by learner differences
in grammatical sensitivity and knowledge of metalanguage. An additional subquestion queried the
extent to which the strength of any relationships
differs depending on whether learners received
direct feedback only or direct feedback plus metalinguistic comments. Spearman correlations,
which were run between the learner difference
measures and the combined gain scores of both
experimental groups on the two assessment tasks,
revealed three medium-sized links, all involving
gain scores on the specific section of the text
summary task. Grammatical sensitivity was found
to correlate with the learners pretestposttest
and pretestdelayed posttest gain scores, and
knowledge of metalanguage with their pretest
posttest gains. In other words, the participants
who demonstrated greater grammatical sensitivity
and familiarity with metalinguistic terminology
were found to benefit more from the feedback
provided.
The same type of correlational analyses, conducted for the two experimental groups separately, yielded the same but large-size correlations
for the direct feedback only group. No significant
correlations were detected for the direct metalinguistic group. Overall, these results suggest
that, while the participants with greater grammatical sensitivity and knowledge of metalanguage
were more likely to learn from the WCF in the
direct feedback only group, the participants performance was not affected by learner differences
when metalinguistic information was also made
available.
An issue worthy of discussion is that the two
learner factors in focus, grammatical sensitivity
and knowledge of metalanguage, were only linked
to the participants gains in the direct feedback
only group, but not to the extent of development

Charis Stefanou and Andrea Rvsz


exhibited by the direct metalinguistic group. A
possible explanation for the lack of moderating
effects observed for the participants in the metalinguistic group may be that the provision of additional metalinguistic information, in fact, neutralised any advantage that could potentially have
been afforded by higher grammatical sensitivity
or knowledge of metalanguage. In other words,
the information supplied in the metalinguistic
comments probably enabled the participants to
compensate for their potentially weaker ability to
recognize grammatical functions and to use grammatical terminology. A similar argument was put
forward by Erlam (2005) in explaining why grammatical sensitivity was found to facilitate L2 learning under the more implicit, but not the more
explicit, instructional conditions in her study of
aptitudetreatment interactions.
Finally, a question that arises is why all the significant relations between the learner factors and
gain scores were detected on the task component
that targeted article use with specific referents.
Again, this finding may be accounted for by the
high scores achieved by the participants on the
generic items. As already mentioned, the participants already produced relatively high scores
on the pretest tasks assessing their ability to mark
generic reference (as noted earlier, this might
not have corresponded to accurate underlying
knowledge). This initial strong performance left
little room to demonstrate pretestposttest development on the part of the learners, leading to a
correspondingly low variance in the gain scores
targeting generic items. This limited variance,
in turn, is likely to have restricted the chances
of detecting significant correlations between the
measures of learner differences and gains in
the generic sections of the assessment tasks. In
contrast, the larger variance of the learners gain
scores in the specific items made it more probable
that any moderating effects of the learner factors
would surface.
CONCLUSION
The present study set out to examine the extent
to which direct WCF can help Greek EFL learners improve their article use for generic and specific plural reference. Two feedback types were
compared: direct written feedback only and direct written feedback plus metalinguistic information. Two learner factors were also investigated
in relation to the participants ability to learn
from WCF: grammatical sensitivity and knowledge
of metalanguage. The novel aspects of the research included its new linguistic target and fo-

277
cus on learner difference factors that have not
yet been investigated (grammatical sensitivity and
familiarity with metalinguistic terms). Corroborating the results of previous empirical research
and contrary to Truscotts (1996) claims, the provision of direct WCF on article use was found
to be superior to the comparison condition. Interestingly, however, supplementing direct WCF
with metalinguistic comments afforded little additional benefit to learners. Furthermore, participants with greater grammatical sensitivity and
familiarity with metalinguistic terminology appeared to improve more only when direct feedback, in the absence of metalinguistic comments,
was supplied in response to article errors. No significant links emerged between the measures of
learner differences and the gains of the experimental group that received direct feedback and
metalinguistic comments. This pattern of findings
is in line with Erlams (2005) proposal that conditions rich in input may have the capacity to neutralize learner differences in cognitive abilities.
Finally, some limitations of the study need to be
acknowledged and considered in future research.
First, the text summary task constituted a focused
pedagogic task, which was designed to elicit specific and generic article use. As a consequence, it
lacked situational authenticity (Ellis, 2003), and
involved article use in relatively controlled contexts only. To address these limitations, an important avenue for future research would involve
extending the research questions posed here to
tasks that resemble more real-life uses of language
and elicit less controlled application of the target rule. Second, we focused on only one aspect
of article use, thus it is not straightforward that
our results would transfer to other linguistic targets (Xu, 2009). This imposes limits on the generalizability of the findings and weakens the claims
we made about feedback efficacy. Thus, a replication of this study with complex linguistic constructions would be especially desirable, given that little is known about the impact of WCF on complex L2 features. A third, related limitation concerns the fact that we only sought evidence about
the participants development in producing and
comprehending the linguistic target. No attempt
was made to examine the potential impact of feedback on the linguistic complexity of learners production as in Hartshorn et al. (2010) and Van Beuningen et al. (2012). Given that the text summary test elicited a list of simple sentences rather
than a text, the learners output was not amenable
to analysis in terms of linguistic complexity measures. Using less controlled tasks in future research, as suggested before, would help resolve

278
this issue. Fourth, our investigation included only
two learner factors. Thus, as suggested by Kormos (2012), further studies exploring the moderating effects of other learner factors, such as working memory capacity and motivation, are also warranted. In future research, context-specific factors
should also be taken into consideration, since issues such as learners schooling environment together with their teachers guidance and assessment practices may shape their capacity to benefit from pedagogical interventions. Finally, followup research would profit from utilizing introspective methods to uncover how learners engage with
different types of WCF, and whether this might
be influenced by learner differences in cognitive
abilities.

ACKNOWLEDGMENTS
We would like to thank the editor and the three
anonymous reviewers for their helpful suggestions on
this article. Any errors, of course, are our own. This research was supported by the Language Learning Dissertation Grant awarded to Charis Stefanou.

NOTE
1

According to the manual of the Oxford Placement


Test, scores ranging between 120 and 149 correspond
to an intermediate level of proficiency, or B1 threshold
and B2 vantage of the Common European Framework
ranking. Keeping with these guidelines, learners were
chosen for participation in the study if their test scores
fell within the aforementioned range.

REFERENCES
Alderson, J. C., Clapham, C., & Steel, D. (1997). Metalinguistic knowledge, language aptitude, and language proficiency. Language Teaching Research, 1,
93121.
Berry, R. (2009). EFL majors knowledge of metalinguistic terminology: A comparative study. Language
Awareness, 18, 113128.
Bitchener, J. (2008). Evidence in support of written corrective feedback. Journal of Second Language Writing, 17, 102118.
Bitchener, J., & Ferris, D. (2012). Written corrective feedback in second language acquisition and writing. New
York: Routledge.
Bitchener, J., & Knoch, U. (2008). The value of written
corrective feedback for migrant and international
students. Language Teaching Research, 12, 409431.

The Modern Language Journal 99 (2015)


Bitchener, J., & Knoch, U. (2009). The contribution of
written corrective feedback to language development: A ten month investigation. Applied Linguistics, 31, 193214.
Bloor, T. (1986). What do language students know about
grammar? British Journal of Language Teaching, 24,
157162.
Carroll, J. B. (1981). Twenty-five years of research in foreign language aptitude. In K. C. Diller (Ed.), Individual differences and universals in language learning aptitude (pp. 83118). Rowley, MA: Newbury
House.
Cobb, T. (n.d.). Web Vocabprofile. Acccessed 16 July 2014
at http://www.lextutor.ca/vp/eng/
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences. Hillsdale, NJ: Lawrence Erlbaum.
Cohen, J. (1992). A power primer. Psychological Bulletin,
112, 155159.
Dave, A. (2004). Oxford Placement Test 1. Oxford: Oxford
University Press.
DeKeyser, R. (2012). Interactions between individual
differences, treatments, and structures in SLA.
Language Learning, 62, 189200.
Elder, C. (2009). Validating a test of metalinguistic
knowledge. In R. Ellis, S. Loewen, C. Elder, R. Erlam, J. Philp, & H. Reinders (Eds.), Implicit and
explicit knowledge in second language learning, testing
and teaching (pp. 113138). Bristol, UK: Multilingual Matters.
Ellis, R. (2003). Task-based language learning and teaching.
Oxford: Oxford University Press.
Ellis, R. (2005). Principles of instructed language learning. System, 33, 209224.
Ellis, R., Sheen, Y., Murakami, M., & Takashima, H.
(2008). The effects of focused and unfocused written corrective feedback in an English as a foreign
language context. System, 36, 353371.
Erlam, R. (2005). Language aptitude and its relationship to instructional effectiveness in second language acquisition. Language Teaching Research, 9,
147171.
Farrokhi, F., & Sattarpour, S. (2012). The effects of direct written corrective feedback on improvement
of grammatical accuracy of high-proficiency L2
learners. World Journal of Education, 2, 4957.
Ferris, D. R. (1999). The case for grammar correction in
L2 writing classes: A response to Truscott (1996).
Journal of Second Language Writing, 8, 110.
Ferris, D. R. (2002). Treatment of error in second language
writing classes. Ann Arbor, MI: The University of
Michigan Press.
Ferris, D. R., & Roberts, B. (2001). Error feedback in L2
writing classes: How explicit does it need to be?
Journal of Second Language Writing, 10, 161184.
Garca Mayo, M. (2008). The acquisition of four nongeneric uses of the article the by Spanish EFL learners. System, 36, 550565.
Hartshorn, K. J., Evans, N. W., Merrill, P. F., Sudweeks, R. R., StrongKrause, D., & Anderson, N.
J. (2010). Effects of dynamic corrective feedback
on ESL writing accuracy. TESOL Quarterly, 44, 84
106.

Charis Stefanou and Andrea Rvsz


Ionin, T., & Montrul, S. (2009). Article use and
generic reference: Parallels between L1- and L2acquisition. In M. Garca Mayo & R. Hawkins
(Eds.), Second language acquisition of articles: Empirical findings and theoretical implications (pp. 147
171). Philadelphia/Amsterdam: John Benjamins.
Ionin, T., & Montrul, S. (2010). The role of L1 transfer
in the interpretation of articles with definite plurals in L2 English. Language Learning, 60, 877925.
Kormos, J. (2012). The role of individual differences in
L2 writing. Journal of Second Language Writing, 21,
390403.
Li, S. (2013). The interactions between the effects of
implicit and explicit feedback and individual differences in language analytic ability and working
memory. Modern Language Journal, 97, 634654.
Lightbown, P. M. (2008). Transfer appropriate processing as a model for classroom second language acquisition. In Z.H. Han (Ed.), Understanding second
language process (pp. 2744). Clevedon, UK: Multilingual Matters.
Liu, D., & Gleason, J. L. (2002). Acquisition of the article the by nonnative speakers of English. Studies in
Second Language Acquisition, 24, 126.
Loewen, S. (2012). The role of feedback. In S. M. Gass &
A. Mackey (Eds.), The Routledge handbook of second
language acquisition (pp. 2440). London: Routledge.
Polio, C. (2012). The relevance of second language acquisition theory to the written error correction debate. Journal of Second Language Writing, 21, 375
389.
Robinson, P. (2002). Learning conditions, aptitude
complexes and SLA: A framework for research
and pedagogy. In P. Robinson (Ed.), Individual differences and instructed language learning (pp. 113
135). Philadelphia/Amsterdam: John Benjamins.
Robinson, P. (2005). Aptitude and second language acquisition. Annual Review of Applied Linguistics, 25,
4673.
Sachs, R. (2010). Individual differences and the effectiveness
of visual feedback on reflexive binding in L2 Japanese.
(Unpublished doctoral dissertation). Georgetown
University, Washington, DC.
Sachs, R., & Polio, C. (2007). Learners uses of two
types of written feedback on a L2 writing revision task. Studies in Second Language Acquisition, 29,
67100.
Schmidt, R. W. (1990). The role of consciousness in
second language learning. Applied Linguistics, 11,
129158.
Sheen, Y. (2007). The effect of focused written corrective feedback and language aptitude on ESL learners acquisition of articles. TESOL Quarterly, 41,
255281.
Sheen, Y. (2010). Differential effects of oral and written
corrective feedback in the ESL classroom. Studies
in Second Language Acquisition, 32, 203234.
Sheen, Y., Wright, D., & Moldawa, A. (2009). Differential effects of focused and unfocused written cor-

279
rection on the accurate use of grammatical forms
by ESL learners. System, 37, 556569.
Shintani, N., & Ellis, R. (2013). The comparative effect of direct written corrective feedback and metalinguistic explanation on learners explicit and
implicit knowledge of the English indefinite article. Journal of Second Language Writing, 22, 286
306.
Skehan, P. (2002). Theorizing and updating aptitude.
In P. Robinson (Ed.), Individual differences and instructed language learning (pp. 6994). Philadelphia/Amsterdam: John Benjamins.
Snape, N., Garca Mayo, M., & Grel, A. (2009). Spanish, Turkish, Japanese and Chinese L2 learners acquisition of generic reference. In M. Bowles et al.
(Eds.), Proceedings of the 10th Generative Approaches
to Second Language Acquisition Conference (pp. 18).
Somerville, MA: Cascadilla Proceedings Project.
Stefanou, C. (2010). The use of the English article system
by Greek learners of English. (Unpublished masters
thesis). Lancaster University, Lancaster, UK.
Storch, N., & Wigglesworth, G. (2010). Learners processing, uptake and retention of corrective feedback on writing. Studies in Second Language Acquisition, 32, 303334.
Suh, B. R. (2010). Written feedback in second language acquisition: Exploring the roles of type of feedback, linguistic targets, awareness and concurrent verbalization.
(Unpublished doctoral dissertation). Georgetown
University, Washington, DC.
Trenkic, D. (2007). Variability in second language article production: Beyond the representational
deficit vs. processing constraints debate. Second
Language Research, 23, 289327.
Trofimovich, P., Ammar, A., & Gatbonton, E. (2007).
How effective are recasts? The role of attention,
memory, and analytical ability. In A. Mackey (Ed.),
Conversational interaction in second language acquisition: A series of empirical studies (pp. 171195). Oxford: Oxford University Press.
Truscott, J. (1996). The case against grammar correction in L2 writing classes. Language Learning, 46,
327369.
Truscott, J. (2007). The effect of error correction on
learners ability to write accurately. Journal of Second Language Writing, 16, 255272.
Van Beuningen, C. G., de Jong, N. H., & Kuiken, F.
(2012). Evidence on the effectiveness of comprehensive error correction in second language writing. Language Learning, 62, 141.
Vatz, K., Tare, M., Jackson, S. R., & Doughty, C. J. (2013).
Aptitudetreatment interactions in second language acquisition: Findings and methodology. In
G. Granena & M. Long (Eds.), Sensitive periods, language aptitude, and ultimate L2 attainment (pp. 273
292). Philadelphia/Amsterdam: John Benjamins.
Xu, C. (2009). Overgeneralization from a narrow focus:
A response to Ellis et al. (2008) and Bitchener
(2008). Journal of Second Language Writing, 18, 270
275.

280

The Modern Language Journal 99 (2015)

Appendix
Version A of Assessment Tasks
Task 1: Text Summary
As part of a school visit to the zoo you have to write a short description about some animals. First you have 3 minutes
to read the given text. Then the text will be replaced with some pictures. You have to write a summary of the text in
English using the pictures to help you remember what it was about.
Text 1:



. 

. 

,

.

A .
T

.


.


.

Charis Stefanou and Andrea Rvsz

281

(Translation)
Lions are the king of the jungle. Lions usually sleep during the day and hunt during the night. Lions
are usually used in circuses.
The lions in our town zoo were brought from Africa. The lions play with a ball all day. Last week the lions
had two babies. Next month the lions will be transferred to France.
Task 2: Truth Value Judgment
Read the following stories and decide if the sentence given below each story is True or False by circling
the corresponding choice. Your decision should be based on the story.
Story 1:
Most cinemas have several rows with seats. But there are three strange cinemas. They dont have seats,
they only have several sofas.
The cinemas have several sofas. TRUE FALSE
Story 2:
Most hotels are very noisy places because of the hundreds of people who live there. But two hotels are
always very quiet. There is a rule that forbids guests to make loud noise.
Hotels are very quiet. TRUE FALSE
Story 3:
Yesterday I heard a very funny story that happened in a school. A little boy was being chased by a little
girl around the school yard because he had stolen her favorite doll.
A little girl was chasing a little boy. TRUE FALSE
Story 4:
In our History class we were taught that most castles were made of big pieces of stone. But the teacher
said that there were two castles that were different. They were made of wood instead of stones.
These castles were made of stones. TRUE FALSE
Story 5:
Most hotels are very noisy places because of the hundreds of people who live there. But two hotels are
always very quiet. There is a rule that forbids guests to make loud noise.
The hotels are very noisy. TRUE FALSE
Story 6:
Yesterday I was at the park and I saw something unusual. A dog was being followed by two squirrels all
around the park.
A dog was following two squirrels. TRUE FALSE
Story 7:
Most cinemas have several rows with seats. But there are three strange cinemas. They dont have seats,
they only have several sofas.
Cinemas have several rows with seats. TRUE FALSE
Story 8:
In ancient Greece most temples didnt have guards to protect them. But two temples were very special.
They were very rich and so they had guards to protect them.
These temples had guards to protect them. TRUE FALSE

282

The Modern Language Journal 99 (2015)

Story 9:
Last night I saw a film about strange animal stories. There was a case of a sheep which was being protected
by a cow while it was injured in the farm.
A cow was protecting a sheep. TRUE FALSE
Story 10:
In ancient Greece most temples didnt have guards to protect them. But two temples were very special.
They were very rich and so they had guards to protect them.
Temples had guards to protect them. TRUE FALSE
Story 11:
In our History class we were taught that most castles were made of big pieces of stone. But the teacher
said that there were two castles that were different. They were made of wood instead of stones.
The castles were made of wood. TRUE FALSE
Story 12:
Last night I saw a film about strange animal stories. There was a case of a sheep which was being protected
by a cow while it was injured in the farm.
A sheep was protecting a cow. TRUE FALSE
Story 13:
Most hotels are very noisy places because of the hundreds of people who live there. But two hotels are
always very quiet. There is a rule that forbids guests to make loud noise.
These hotels are very quiet. TRUE FALSE
Story 14:
In ancient Greece most temples didnt have guards to protect them. But two temples were very special.
They were very rich and so they had guards to protect them.
The temples didnt have guards to protect them. TRUE FALSE
Story 15:
Yesterday I heard a very funny story that happened in a school. A little boy was being chased by a little
girl around the school yard because he had stolen her favorite doll.
A little boy was chasing a little girl. TRUE FALSE
Story 16:
Most cinemas have several rows with seats. But there are three strange cinemas. They dont have seats,
they only have several sofas.
These cinemas have several rows with seats. TRUE FALSE
Story 17:
In our History class we were taught that most castles were made of big pieces of stone. But the teacher
said that there were two castles that were different. They were made of wood instead of stones.
Castles were made of stones. TRUE FALSE
Story 18:
Yesterday I was at the park and I saw something unusual. A dog was being followed by two squirrels all
around the park.
Two squirrels were following a dog. TRUE FALSE

Copyright of Modern Language Journal is the property of Wiley-Blackwell and its content
may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for
individual use.

You might also like