You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/306068730

Cross-Battery Assessment? XBA PSW? A case of mistaken identity: A


commentary on Kranzler and colleagues' “Classification agreement analysis of
Cross-Battery Assessment in the ident...

Article · July 2016


DOI: 10.1080/21683603.2016.1192852

CITATIONS READS

4 261

2 authors:

Dawn P. Flanagan W. Joel Schneider


St. John's University Temple University
68 PUBLICATIONS   1,691 CITATIONS    31 PUBLICATIONS   730 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dawn P. Flanagan on 21 May 2018.

The user has requested enhancement of the downloaded file.


INTERNATIONAL JOURNAL OF SCHOOL & EDUCATIONAL PSYCHOLOGY
2016, VOL. 4, NO. 3, 137–145
http://dx.doi.org/10.1080/21683603.2016.1192852

POINT-COUNTERPOINT: RESPONSE

Cross-Battery Assessment? XBA PSW? A case of mistaken identity: A commentary


on Kranzler and colleagues’ “Classification agreement analysis of Cross-Battery
Assessment in the identification of specific learning disorders in children and
youth”
Dawn P. Flanagana and W. Joel Schneiderb
a
Department of Psychology, St. John’s University, Queens, New York, USA; bDepartment of Psychology, Illinois State University, Normal, Illinois, USA

When education works, it works to the benefit of all. Kranzler et al. (this issue) present evidence that they
It creates productive, innovative citizens eager to believe casts doubt on the validity of Flanagan and
contribute to a well-functioning democracy. In contrast, colleagues’ procedures. Because of our great respect for
educational failure has lifelong consequences, with these scholars’ previous work, we were extremely puzzled
some individuals experiencing decades of preventable by the conceptual errors in their arguments. We found
hardship. Like Kranzler, Floyd, Benson, Zabowski, and their evidence to be neither persuasive nor, to speak
Thibodaux (this issue), we have dedicated our lives to plainly, at all relevant to the question of the accuracy of
finding better ways of identifying children who are falling SLD identification. Unfortunately, although their critique
behind in school and need additional help. With finite is nominally directed at Flanagan and colleagues’ methods,
resources, every diagnostic error is a misallocation of time, in truth it is a critique of a set of procedures that are
effort, and money with predictably damaging conse- endorsed by no one, most emphatically not by Flanagan
quences for individuals, families, and society as a whole. and colleagues. As will be shown, the differences between
Science thrives when all ideas are subject to scrutiny, what Kranzler et al. (this issue, p. 125) call “XBA PSW” and
including (and especially) those that are longstanding and what Flanagan and colleagues advocate are stark, and in
influential. Hard-hitting criticism of current practices (or many important ways are diametrically opposed.
even gentle, thoughtful critiques) may be difficult for
advocates of current practices to hear, but free and open
exchange of opinion is essential for progress. Flanagan and Aspirations and conflicts of interest
colleagues have presented a set of systematic procedures for It should be disclosed that one of us (DPF) is the primary
identifying specific learning disabilities (SLDs; Flanagan, architect of the ideas and procedures under scrutiny,
Ortiz, & Alfonso, 2013). Because the procedures have been and, despite earnest affirmations of her commitment to
widely adopted by school psychologists, they should scientific neutrality, has a financial stake in these matters in
be carefully scrutinized and rigorously evaluated. To the the form of book sales, software sales, and paid workshops.
degree that the procedures are suboptimal, they should be The second author (WJS) is an independent scholar with no
adjusted, reformed, or set aside for something better. This personal, professional, or financial interest in the outcome
does not necessarily mean that they were missteps. Some of this commentary. WJS has declared no allegiance to the
previously popular but now abandoned ideas were essential procedures advocated by Flanagan and colleagues and has a
stepping stones on the path to progress. Indeed, the history of suggesting alternative procedures of his own (e.g.,
procedures that Flanagan and colleagues currently advocate Schneider, 2013), a practice he intends to continue. We join
have undergone many revisions since they were first together here because both of us simply want the field to
proposed, often in response to critical feedback. Revisions advance. We believe it is better to be accurate in the long run,
that take into account valid criticism are quite likely needed than to have only the appearance of being accurate right
to optimize SLD identification. now. However, if Flanagan and colleagues’ methods need to

CONTACT Dawn P. Flanagan flanagad@stjohns.edu School Psychology Program, Department of Psychology, St. John’s University, 8000 Utopia Parkway,
Queens, NY 11439, USA.
Note from the Editor: This is the second of a three article series.

q 2016 International School Psychology Association


138 D. P. FLANAGAN AND W. J. SCHNEIDER

change, the motivating arguments for change must be valid. (CHC) theory and research (McGrew, 2005; 2009;
For reasons we articulate below, we do not believe that Schneider & McGrew, 2012). Unlike other “flexible
the evidence presented by Kranzler et al. provides a sound battery” practices, rigorous procedures and methods
rationale for change at this time. accompany XBA to insure that any assessment that
expands beyond the confines of a single battery is both
psychometrically and theoretically defensible. To assist
Preliminary clarification of terms: XBA  PSW
in XBA and in interpretation of cross-battery data,
Before addressing the heart of the matter before us, a X-BASS was developed (Ortiz et al., 2015). X-BASS is
little housecleaning is in order: We need to disentangle an integration and substantial revision of the software
Cross-Battery Assessment (XBA) from SLD identifi- programs that accompanied the second and third
cation. Until recently, Flanagan and colleagues referred editions of Essentials of Cross-Battery Assessment
to their method of SLD identification as simply “an (Flanagan et al., 2007, 2013).
operational definition of SLD” or “a CHC-based Although XBA can be used in the context of SLD
operational definition of SLD.” Their operational identification, it has many other applications. Given that
definition was first presented nearly 15 years ago DD/C and XBA have been described in the same works
(Flanagan, Ortiz, Alfonso, & Mascolo, 2002) and has and that they are often used together, it is nevertheless
been revised and updated several times over the past easy to see how they can be conflated. However, XBA
decade to insure that it was consistent with contemporary and DD/C are independent and conceptually distinct
theory and research (e.g., Flanagan, Ortiz, Alfonso, & procedures. If somehow well-grounded critiques caused
Mascolo, 2006; Flanagan, Ortiz, & Alfonso, 2007, 2013; one of them to fall, the other would still be left standing,
Ortiz, Flanagan, & Alfonso, 2015). However, because fully intact.
their operational definition was often called by others Given that the use of the term DD/C is fairly recent,
“XBA,” rather than being conceived of as a method that it is understandable that Kranzler et al. referred to DD/C
was separate from yet compatible with XBA, Flanagan as “XBA PSW.” Nevertheless, it is a misleading term.
and colleagues (2013) gave it a name—the Dual Kranzler et al.’s study has no bearing on XBA per se:
Discrepancy/Consistency operational definition of SLD Multiple batteries were not crossed, nor were multiple
(or the DD/C method for short). scores combined. That is, the authors used a single
Flanagan and colleagues developed a software battery in their study, the WJ III. Single-battery
program to assist with the quantitative analyses necessary assessment is not Cross-Battery Assessment. Still, it can
to determine whether a particular pattern of strengths be conceded that XBA, more broadly conceived, is a
and weaknesses in cognitive and academic performance procedure for combining scores, even if scores come from
supports SLD based on DD/C criteria. The program that the same battery. However, this is not what Kranzler et al.
Kranzler et al. referred to in their study (although did not did. Even though multiple measures of relevant
use in the manner it was intended to be used) is called, constructs from the WJ III were available to the authors,
“Cross-Battery Pattern of Strengths and Weaknesses they chose to use single measures (subtests) instead. XBA,
Analyzer” (PSW-A v1.0; Flanagan et al., 2013), hence the narrowly or broadly conceived, makes no appearance
use of the term “XBA PSW” in their study. The revised whatsoever in Kranzler et al.’s study. Thus, it should
version of the PSW-A program is included in the Cross- be clear that whatever the results of Kranzler et al.’s study,
Battery Assessment Software System (X-BASS; Ortiz et al., they do not have any bearing on the validity of the XBA
2015), wherein the output page for the PSW analysis procedures for combining scores in a theoretically and
is clearly labeled: Dual-Discrepancy/Consistency Model: psychometrically defensible manner.
Summary of PSW analyses for SLD. One does not need
to use cross-batteries or use the XBA approach to apply
Limitations of Kranzler et al.’s methods and flaws
DD/C criteria. However, since most single batteries
in their reasoning
do not measure the breadth of cognitive and academic
constructs considered necessary to inform SLD identifi- Our primary concern about the methods used by
cation decisions, crossing batteries is typically necessary; Kranzler et al. is that the DD/C operational definition
accordingly, there is a close connection between XBA and of SLD (which Kranzler et al. refer to as “XBA PSW”) was
the DD/C method. not applied accurately. While some of these inaccuracies
XBA is a method for combining tests from different are probably unavoidable, the most consequential
batteries and predates DD/C by several years (Flanagan inaccuracies are “unforced errors” on the part of Kranzler
& McGrew, 1997; Flanagan & Ortiz, 2001). The XBA et al. We will start with comparatively minor concerns
approach is grounded mainly in Cattell-Horn-Carroll and proceed to more serious errors.
INTERNATIONAL JOURNAL OF SCHOOL & EDUCATIONAL PSYCHOLOGY 139

Regrettable but probably necessary helps us know how robust a procedure is with relation to
methodological limitations the complexities and vicissitudes of human experience.
Kranzler et al.’s study essentially answers the following
How accurate is DD/C when there is no background question: If practitioner judgment were replaced with
information available? inflexibly programmed robots who examined the scores
It would be impossible for Kranzler et al. to evaluate the from children plucked at random from the general
DD/C method in its entirety using only data from the population and did not consider whether the scores
WJ III standardization sample. The DD/C method is looked unusual, did not consider behavioral observations,
intended to be used in conjunction with multiple data educational history, or exclusionary factors, did not
sources, including prereferral intervention information, conduct follow-up testing, did not listen to parent and
parent and teacher reports, classroom observations, teacher reports, and did not pay attention to clinical
results of an evaluation of exclusionary factors, and so details, how accurate would the scores be in and of
forth. These additional data sources are necessary to themselves in identifying who has an SLD? Despite the
assist practitioners in interpreting the results of the DD/C silly way we have framed the question, it is actually an
analysis. In fact, when all components of the DD/C important and fair question to ask. If we can identify the
analysis are supported, meaning that the pattern suggests situations in which our hypothetical robots fail to arrive
SLD, the program output (PSW-A and X-BASS) states, at the correct diagnosis, the procedures can be amended
“This pattern of results does not automatically confirm and improved. However, whenever asking such ques-
the presence of SLD. This pattern must be considered tions, researchers should be clear that the answers have
within the context of the entire case history of the limited scope in how we evaluate the overall validity of
individual. In addition, other data gathered through the method.
multiple methods need to be considered.” Similarly, when None of the PSW models currently in use are intended
one or more components of the DD/C analysis are not to be applied in a robotic fashion. Also, any quantitative
supported, meaning that the pattern does not suggest data (such as those derived from a PSW analysis) that are
SLD, the program output states, “This pattern of results not considered within the context of exclusionary factors
does not automatically rule out the presence of SLD.” is inconsistent with statutory and regulatory require-
Thus, any study on the classification agreement of ments. That said, requiring that scholars evaluate the
the DD/C operational definition that relies solely on a accuracy of DD/C only when it is performed exactly right
quantitative analysis without consideration of other data to the last detail is an unrealistic standard. The balanced
sources will yield many false positives and false negatives. approach is to allow for such departures from standard
It is quite likely that some, if not many, of the children practice but to clearly label them as such, tempering one’s
in the standardization sample with otherwise SLD-like conclusions as needed. We believe that Kranzler et al.’s
scores would fail to meet criteria for SLD if more negative conclusions about DD/C should have been
information were available. In Kranzler et al.’s analysis, described clearly as limited in scope vis-à-vis the
such children are counted as diagnostic errors, errors that necessary limitations of their study’s design.
would not have been made by practitioners following
DD/C procedures accurately. To be fair, several times
Kranzler et al. acknowledge that such cases exist. Methodological shortcomings that are both
However, their analyses do not, nor are their conclusions regrettable and unnecessary
softened by an acknowledgement that their analyses
Single measures make for poor measurement
are a worst-case scenario with respect to exclusionary
criteria. That is, because exclusionary factors were not Dismantling a flawed version of an opposing view is
considered in their study, not a single member of the WJ largely unproductive, and possibly destructive if it
III standardization sample met exclusionary criteria and misleads people to dismiss a perfectly valid stronger
thus every so-called “error” counted against DD/C. version of the view. To conduct a thoroughly convincing
takedown of an argument, it is much more persuasive
to start by restating the best possible version of the
How accurate would DD/C be if it were implemented
argument and then prove that it is incorrect. To provide a
by robots?
convincing demonstration that DD/C is error-prone, it
When a diagnostic procedure is proposed, it is legitimate is necessary to show that DD/C, as recommended by
to ask how accurate it would be if it were unflinchingly Flanagan and colleagues, is error-prone. Showing that
followed to the letter, whether it made sense to do so or an inferior version of DD/C is error-prone, especially one
not. Such a thought experiment might seem silly, but it explicitly discouraged by Flanagan and colleagues, simply
140 D. P. FLANAGAN AND W. J. SCHNEIDER

proves that Flanagan and colleagues were right in believe is one of its assumptions. To evaluate this
discouraging its use. assumption, dichotomizing continuous data is not
Kranzler et al. were not forced by necessity to use single needed. Furthermore, doing so underestimates the effects
measures instead of multiple measures, as recommended of cognitive abilities on academic abilities, thus needlessly
by Flanagan and colleagues. The WJ III standardization stacking the deck against the viability of the so-called
sample they used includes literally thousands of cases assumption of DD/C.
with reasonably complete data. Kranzler et al. could easily
have used the multiple-measure broad ability scores
from the WJ III (Gf, Gc, Gv, Ga, Glr, Gsm, and Gs). Unforced errors and fatal flaws
Indeed, the WJ III has enough supplementary scores that Here we come to the heart of our critique of Kranzler
with a few lines of extra programming, they could have et al.’s study. The DD/C method must be evaluated in
simulated the decision process of practitioners who need terms of the claims that it actually makes. However, the
to conduct follow-up testing when subtest scores in the methods used by Kranzler et al. make DD/C responsible
same domain are widely discrepant, meaning that the for predictions it does not and should not make. Not only
broad ability score is not cohesive. However, Kranzler do Kranzler et al. start with false premises, the reasoning
et al. chose to use single measures so that their analyses they apply to their results is not valid, which leads them
would supposedly be more informative for real-world to unsound conclusions about the accuracy of DD/C.
practices in which saving time is an important
consideration. Unfortunately, this choice makes their
Kranzler et al. misinterpret what Flanagan and
study largely uninformative for the evaluation of DD/C.
colleagues mean by the term causal
Flanagan and colleagues recommend using multiple
measures of a construct for a good reason: single First Kranzler et al. correctly state that the DD/C approach
measures are generally not sufficiently reliable for DD/C, to SLD identification “is based on the postulate that deficits
nor do they have adequate content validity. That is, they in cognitive abilities in the presence of otherwise average
do not cover the broad range of abilities implied by the general intelligence are causally related to academic
broad constructs in CHC theory. Again, even if Kranzler achievement weaknesses” (p. 127). Then, also correctly,
et al. had succeeded in showing that a single-measure they state that, “If the model is properly specified, the
version of DD/C is not sufficiently accurate, this would purported relations between specific cognitive abilities and
merely have demonstrated that Flanagan and colleagues specific areas of academic achievement should hold”
were correct in discouraging using single measures in (p. 127). However, their reasoning takes a wrong turn here:
DD/C. However, this is not where things stand. Although “If specific broad cognitive abilities are causally related to
Kranzler et al. claimed they have shown that a single- academic deficits, as Flanagan et al. (2013) presume, then
measure version of DD/C is often inaccurate, their any sample of individuals with significant cognitive
analyses answer an entirely different question, one that deficits will, according to their theory, necessarily have
has little bearing on the accuracy of DD/C. corresponding weaknesses in related academic areas”
(p. 133). In the context of the entire paper, they appear to
take the term causal to mean deterministic, meaning that if
Continuous variables, needlessly dichotomized
we know the causal inputs, the outcomes can be predicted
Cognitive and academic abilities are continuous vari- perfectly. No one, most certainly not Flanagan and
ables. When continuous variables are used to make colleagues, has made such a claim about the causal
dichotomous decisions, as is often the case in SLD relationships between cognitive and academic abilities.
identification, then cut scores must reluctantly be used. We suspect that Kranzler et al. would deny that this was
The inescapable downside to using continuous scores their intention, yet this is precisely what their analyses
to make dichotomous decisions is that there is little (both statistical and conceptual) imply.
justice for those who are very close to the wrong side of Kranzler et al.’s analyses demonstrate how well
the cut score, and the whole procedure is likely to seem dichotomized cognitive scores (deficit/no deficit) can
arbitrary and unfair to them. predict dichotomized academic scores (deficit/no deficit).
In the beginning of their paper, Kranzler et al. stated Kranzler et al. imply that DD/C should be faulted for the
that its aim is to evaluate what they believe is one of fact that cognitive abilities imperfectly predict academic
DD/C’s assumptions: Specific cognitive deficits cause abilities. That is, whenever there is a mismatch, it is a
academic deficits, even in the absence of general cognitive diagnostic error on the part of DD/C. In such an analysis,
deficits. This question is not about the accuracy of a high positive and negative predictive value could only
DD/C but about the validity of what Kranzler et al. have been achieved simultaneously if cognitive abilities
INTERNATIONAL JOURNAL OF SCHOOL & EDUCATIONAL PSYCHOLOGY 141

Specific Cognitive Deficits claims may have been made by Flanagan and colleagues are
Absent Present probabilistic in nature. That is, cognitive deficits raise the
risk of academic deficits; they do not guarantee them.
Absent True Negative False Positive Why not? If a particular specific cognitive deficit causes
Academic Deficits a particular academic deficit, should it not always have the
Present False Negative True Positive same result? No. The great conceptual leap that Pearson
(1930, p. 2) credits Galton with is the idea that partial
Figure 1. Diagnostic hits and misses according to Kranzler et al.’s causation can be quantified: “Galton turning over two
version of DD/C. different problems in his mind reached the conception of
Note. Children with general cognitive deficits are excluded from correlation: A is not the sole cause of B, but it contributes
this classification. to the production of B; there may be other, many or few,
causes, some of which we do not know and may never
explained a much, much higher percentage of variance know.” From here, it is not hard to imagine that there
in academic abilities than they actually do. Yes, cognitive are many modifiers of the effect of cognitive abilities on
abilities are causally related to academic abilities, but academic abilities. That is, a specific cognitive deficit may
the causal relationship is of moderate size, and only severely undercut academic achievement in some
probabilistic, not deterministic. No one, not least circumstances but not at all in others. For example,
Flanagan and colleagues, has claimed otherwise. phonological processing deficits may not result in reading
Are we being unfair to Kranzler et al.’s conceptual decoding deficits if the child has an exceptionally skilled
analysis? We think not. Implicit in Kranzler et al.’s first-grade reading teacher or if the child is especially
classification of diagnostic errors (see Figure 1) are the motivated to learn to read. Just because an effect is not
following false assumptions about DD/C: always present does not make it any less causal.
Is this a case of special pleading? Hardly. We know,
False assumption #1. Among children of at least average
for example, that exposure to certain viruses cause the
intelligence, everyone with certain specific cognitive deficits
common cold. Yet, not every exposure to the virus results
should have corresponding specific academic deficits.
in illness. So it is with cognitive deficits and academic
If children with these specific cognitive deficits are
problems. It is possible that from the vantage point of
observed to have no academic deficits, then DD/C has
complete omniscience we would see that every step in
failed. Such children are counted as “false positives”
the causal sequence from cognitive deficits to academic
(Kranzler et al., this issue, p. 132).
problems is fully deterministic. However, from the
position of incomplete understanding, probabilistic
False assumption #2. Among children of at least average
models are our best and perhaps only option.
intelligence, everyone with certain specific academic deficits
should have corresponding specific cognitive deficits. If
Disastrous equivocation: Kranzler et al.’s “false posi-
children with these specific academic deficits are observed
tives” in the prediction of academic deficits are not false
to have no specific cognitive deficits, then DD/C has failed.
positives in DD/C’s diagnostic accuracy. Kranzler et al.
Such children are counted as “false negatives.”
call children who have specific cognitive deficits but
Both of these assumptions are extreme and unfair to no academic deficits “false positives,” provided that
DD/C.1 To repeat, both claims are explicitly and they have at least average intelligence (p. 132). If, for the
emphatically denied by Flanagan and colleagues in their sake of argument, we conceded that DD/C does in fact
formulation of DD/C. While some evidence is available to imply that cognitive deficits are necessarily associated
support causality for certain cognitive abilities and certain with academic deficits, there would be nothing wrong
academic outcomes, such as the seminal work of Wagner with the use of the term false positive in this context.
and Torgesen (1987), the bulk of the research is However, in the context of diagnosis, the term false
correlational, a fact that has been reported and explained positive refers to when a diagnostic procedure incorrectly
by Flanagan and colleagues in all of their writings on SLD identifies a person with a condition that is not actually
identification (e.g., Flanagan, Ortiz, Alfonso, & Mascolo, present. Later in the paper, the high false positive rate
2002, 2006; Flanagan et al., 2007, 2013). Whatever causality in the prediction of cognitive deficits is used to imply
that DD/C has low positive predictive power in SLD
identification: “[O]ur PPV results suggest that clinicians
1. We assume implicitly that every argument was advanced by Kranzler will spend a great deal of time conducting assessments
et al. in good faith. When we say that an argument is unfair to DD/C, the
unfairness resides in the implications of Kranzler et al.’s arguments, not in that have a very low probability of accurately identifying
their intentions. true SLD” (p. 134).
142 D. P. FLANAGAN AND W. J. SCHNEIDER

Elsewhere, Kranzler et al. state that, “No PPV value for difference between expected and predicted performance
any CHC broad cognitive ability exceeded 40% [with in the scores that represent the abilities is unusual in
a few minor exceptions]. These findings mean that a the general population. Kranzler et al. did not apply the
diagnosis of SLD using the XBA PSW method has at best DD/C criteria correctly.2
a chance of about 50% that the individual truly has SLD It is illuminating to consider who might be among
when the assessment outcome is positive” (p. 131). Kranzler et al.’s “false positives.” Presumably, there are
Huh? No, this means that when a child plucked at many students with cognitive processing deficits who
random from the WJ III standardization sample has at never fall behind academically because dedicated parents,
least one cognitive deficit on one of the seven standard skillful teachers, and thoughtful school psychologists
subtests, there is at best a 50% chance that the person also worked long and hard to prevent problems that would
has an academic deficit on one of several WJ III academic otherwise have occurred. To think that the existence of
tests. This fact has very little to do with whether a person such students points to a failing of DD/C is simply not fair.
really has SLD. The assessment outcome is not “positive” Another group of students that Kranzler et al.’s
for SLD in such a case. Here Kranzler et al. have (no analyses classify as “false positives” would be students
doubt unintentionally) committed the logical fallacy of who do have clear cognitive deficits but their effect causes
equivocation and are led to an incorrect conclusion. only mild academic underperformance. For example,
The logical fallacy of equivocation occurs when an a child with high overall ability but low working
argument hinges on an unacknowledged and improper memory ability may struggle a bit with multistep math
shift in the meaning of a word or phrase. For example, calculations but still manage to perform in the average
“Limburger cheese smells funny. Funny things make range. The fact that DD/C does not give such a child
people laugh. Therefore, the smell of Limburger cheese the SLD label is a feature, not a bug. A child with mild
makes people laugh.” academic problems might need a little extra help, but
Laid bare, the logic of Kranzler et al.’s equivocation assigning the child SLD status and marshalling special
goes like this: education resources to provide this help is like sending
an F-22 Raptor to do routine crop dusting. Implemented
. DD/C says that people with specific cognitive
properly, DD/C conserves precious resources for students
deficits should always have academic deficits. There
most in need of them.
are a lot of false positives in DD/C’s prediction of
academic deficits from cognitive deficits. Thus, DD/
Kranzler et al.’s “false negatives” are not false negatives
C is wrong.
in SLD identification. The same equivocation on the
. False positives occur when a procedure falsely
term false positive is also present with respect to the term
identifies people with a condition.
false negative, which cause Kranzler et al. to misapply their
. Therefore, DD/C falsely identifies a lot of people
negative predictive value results to the question of DD/C’s
with SLD who do not in fact have SLD.
accuracy. The fact that this reasoning error worked to DD/
That is, they use the term false positive from one C’s favor this time is irrelevant: These results have little to
context (the prediction of hypothesized academic no bearing on the question of DD/C’s accuracy.
deficits), to apply to false positives in another context DD/C allows for people who score low on academic
(the accurate identification of SLD). The truth is that achievement tests for reasons other than cognitive
in DD/C one cannot be identified as having SLD if no processing problems. Kranzler et al. appear to agree,
academic weaknesses or deficits are present. According stating, “It is important to note that a significant academic
to Flanagan and colleagues, “If weaknesses or deficits in
the child’s academic achievement profile are not 2. In DD/C, the expected score in both the cognitive and academic areas of
identified, then the issue of SLD may be moot because identified weakness is predicted from an overall ability score, called
such weaknesses are a necessary component of the the “Intact Ability Estimate (IA-e)” in the PSW-A program (note that the
program allows for a score such as the broad cognitive ability score from the
definition” (Flanagan et al., 2012). In fact, this is one WJ III to be used in place of the IA-e). The criterion of a difference between
of the distinctive features of DD/C. In some circum- predicted and actual performance was not used in the Kranzler et al. study
stances, other procedures (e.g., Fiorello et al., 2012) (or any other studies that claim to have evaluated the diagnostic accuracy of
DD/C), but is nevertheless an important one in DD/C because it is used
allow for people to be diagnosed with SLD even if the to identify “domain-specific cognitive weaknesses” and “academic
academic ability is well above average. In DD/C, underachievement.” Instead, Kranzler et al. (this issue) used scores less
although specific cognitive abilities may cause relative than 90 to represent cognitive and academic weaknesses, many of which
underperformance, such cases are not termed disabil- would likely not have been considered actual weaknesses in the PSW
analysis based on DD/C criteria. Moreover, they incorrectly stated, “In the
ities unless the abilities are actually below average, XBA PSW method, the difference between the broad cognitive ability and
differ significantly from overall ability, and the achievement scores is not tested statistically” (p. 129). It most certainly is.
INTERNATIONAL JOURNAL OF SCHOOL & EDUCATIONAL PSYCHOLOGY 143

weakness is not ipso facto evidence of SLD, as deficits 4. Evaluate the accuracy of those decisions by
in achievement may result from a number of non- comparing the results obtained from observed
cognitive factors (e.g., low motivation or poor instruc- scores to the true status defined by latent
tion)” (p. 130). Therefore, it is perplexing that all cases of scores.
academic weaknesses without cognitive weaknesses are
This approach has at least the theoretical possibility
misleadingly called “false negatives,” as if practitioners
of estimating DD/C’s diagnostic accuracy. Future studies
using DD/C should assign such children the SLD label, but
along these lines may prove very informative as to how
fail to. DD/C explicitly excludes such cases, and for good
DD/C should be amended to increase its diagnostic
reason.
accuracy. In contrast, the methods of Kranzler et al. could
As acknowledged by Kranzler et al., DD/C procedures
be replicated a thousand times and we would be no
include an extensive list of exclusionary criteria for
nearer to reforming SLD identification. Why?
SLD. There are many reasons that children underperform
Unlike the study by Stuebing et al. in which it was possible
academically for reasons that have nothing to do with
to know which simulated cases truly met criteria for SLD, the
SLD. For example, a student of average intelligence who
study by Kranzler et al. used observed scores from the WJ
misses two years of school because of severe and chronic
III standardization sample. There is no way to know which
medical problems would likely fall behind academically.
observed scores are accurate and which are not. This is where
This child would be called a “false negative” by Kranzler
the whole enterprise should have stopped. With real cases,
et al.’s analyses, suggesting that the student truly has
there is no error-free standard to which one can appeal to
an SLD and the DD/C method missed that student. One
decide if a diagnostic error has occurred. At best, with real
need not be an adherent of DD/C to see why this is a bad
scores one can evaluate how closely two diagnostic methods
idea. Indeed, we are quite confident that Kranzler et al.
agree. If one procedure is strongly suspected to be highly
would not identify such a child as having an SLD. Thus,
accurate, it can act as the “Gold Standard” method,
it is unfair to hold DD/C responsible for an “error” that is
substituting for the “true diagnosis.”
actually the right call to make. We could generate many,
For example, suppose that each child not meeting DD/C
many more examples like this but do not wish to belabor
exclusionary criteria is given 10 tests of each relevant
the obvious: These are not diagnostic errors.
cognitive and academic ability and then the scores were
used to come to a diagnostic decision about SLD. If such a
Kranzler et al. versus Stuebing et al.: procedure were feasible, it is likely that it would be quite
An instructive contrast accurate. Suppose that we select two measures of each ability
Like Kranzler et al., Stuebing et al. (2012) also conclude at random for each child to simulate the accuracy of DD/C
that DD/C is likely to identify children with SLD who as it is more typically practiced. If the results of the 2-test
do not have it. However, their reasoning does not suffer version of DD/C are highly consistent with the results of the
from the same flaws as that of Kranzler et al. and thus 10-test version of DD/C, this would suggest that the 2-test
constitutes a call to action. The study by Stuebing et al. is version is a fairly accurate method. If not, the 2-test
not the final word on the matter, but it should be credited procedure recommended by DD/C may need to be updated
for getting the conversation started about estimating to a higher number of tests. It is even possible to refine one’s
the accuracy of SLD identification procedures. Although recommendations such that highly correlated broad abilities
Stuebing et al. fail to implement DD/C properly in several (e.g., verbal comprehension and knowledge) need fewer
important ways (e.g., specific criteria are not evaluated, measures, whereas less correlated broad abilities (e.g., long-
the procedure is followed robotically, no follow-up term memory) require many more measurements before
testing is conducted, and it only considers the accuracy scores converge on a trustworthy estimate.
of DD/C under the conditions of low population base
rates instead of the more realistic base rates in referred
populations), they do frame their questions properly and If we could rewind time and rewrite Kranzler
the overall logic of their approach is sound: et al.’s paper
1. Create simulated latent and observed scores There is an important point to be found in Kranzler et al.’s
according to patterns hypothesized to exist by results that, if properly framed, deserves attention in the
the theory underlying the PSW method. field. Too often, researchers and practitioners learn about
2. Identify cases that truly meet the PSW criteria for small-to-moderate correlational effects and fail to think
SLD. through what exactly they mean. It is absolutely true, for
3. Simulate the decision processes recommended by example, that phonological processing deficits are associ-
the PSW method. ated with reading decoding problems (Wagner & Torgesen,
144 D. P. FLANAGAN AND W. J. SCHNEIDER

1987). However, the effects are not deterministic and are The happy truth is that many, if not most, people with
not nearly as large as many people suppose. specific cognitive weaknesses and deficits find various
For example, imagine that a 6-year-old child has a WJ ways around them. The children who do not, the ones
IV Comprehension-Knowledge (Gc) cluster of 100, a most likely to be referred for SLD evaluations, may very
Fluid Reasoning (Gf) cluster of 100, and an Auditory well be the exceptions to this pattern. This, or something
Processing (Ga) cluster of 64. Most of us would consider like it, in our opinion, should have been the main point
this to be the start of what is likely to be a very SLD-ish of Kranzler et al.’s paper: Not that DD/C is an inaccurate
cognitive profile. A score of 64 on the Ga cluster is very method of diagnosing SLD, but that we should not
low, and given how many studies have investigated Ga’s overestimate the effects of cognitive deficits on academic
relationship with reading decoding skills, we would not performance.
be surprised to learn that the child scored an 80 on the Simply because we see cognitive and academic deficits
WJ IV Basic Reading Skills cluster (BRS). Indeed, if there together in the same child does not mean that our work
was any surprise, it would likely be that the BRS score as psychologists is done. Cognitive abilities are indeed
wasn’t lower than 80, perhaps closer to 64. extremely important causal determinants of academic
Using the correlation matrices from the WJ IV abilities. However, there is a host of other factors that
Technical Manual and a little matrix algebra (available can in aggregate be much more important than cognitive
upon request), it is possible to create a multiple abilities in influencing academic outcomes, though their
regression equation to estimate the typical BRS score effects may be small individually. Psychologists need to
among children with known scores on Gc, Gf, and Ga: give cognitive abilities their proper consideration, but
must also weave together all the evidential threads into
Predicated BRS ¼ 19:91 þ 0:180Gc þ 0:346Gf a coherent narrative of the child’s academic difficulties.
Only then can psychologists be in the position to give
þ 0:275Ga truly helpful advice to parents and teachers trying to help
children who have fallen behind.
Amazingly enough, if we plug Gc ¼ 100, Gf ¼ 100,
and Ga ¼ 64 into the equation, the predicted BRS score
would be not 64, not 80, but 90. How can that possibly be? About the authors
A score of 90 is in the average range! More than half of Dr. Dawn P. Flanagan is Professor of Psychology at St. John's
University in Queens, NY. She is also Clinical Professor at Yale Child
children with this cognitive profile (Gc ¼ 100, Gf ¼ 100, Study Center, Yale University School of Medicine in New Haven, CT.
Ga ¼ 64) are expected to score in the average range or She is a widely published author in the areas of cognitive assessment,
better on BRS? Yes. In fact, a child with this cognitive specific learning disabilities, and professional issues in school
psychology. She also serves as an expert witness, learning disabilities
profile who scores 80 on BRS is about 10 points below consultant, and test/measurement consultant and trainer for
expectations. Assuming that the scores are multivariate organizations both nationally and internationally.
normal, only about 1 in 5 children with Gc ¼ 100, W. Joel Schneider completed a doctorate in clinical psychology at
Gf ¼ 100, and Ga ¼ 64 would score 80 or lower on BRS. Texas A&M University and is a Professor of Psychology at Illinois State
Does this mean that Flanagan and colleagues are wrong University, where he has the good fortune to be able to follow his
scholarly interests wherever they lead. He develops new psychometric
about specific cognitive deficits having negative effects on methods and develops clinician-friendly computer programs to help
academic performance? Of course not! In this thought clinicians make better inferences about individuals. He writes essays,
experiment, if the child’s Ga score were also 100, the makes videos, and sometimes sings about psychology, psychometrics,
predicted BRS would be 100 as well, which is 10 points and software at his blog, AssessingPsyche.
higher than if the Ga score were 64. Under the assumption
of multivariate normality, only about 4% of children with References
scores of 100 on all three cognitive scores would score 80 or
lower on BRS. Putting this information together, a child Fiorello, C. A., Hale, J. B., & Wycoff, K. L. (2012). Cognitive
hypothesis testing: Linking test results to the real world.
with a severe cognitive deficit in Ga (standard score ¼ 64) In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary
but otherwise average cognitive abilities in other areas (i.e., intellectual assessment: Theories, tests, and issues
Gc ¼ 100, Gf ¼ 100) is nearly 5 times as likely to score 80 (pp. 484 –496). New York, NY: Guilford Press.
or lower on BRS than a child without any cognitive deficits Flanagan, D. P., Alfonso, V. C., Mascolo, J. T., & Sotelo-
(i.e., Gc, Gf, and Ga ¼ 100). This is an illustration of the Synega, M. (2012). Use of ability tests in the identification
of specific learning disabilities within the context of an
general principle that even severe specific cognitive deficits
operational definition. In D. P. Flanagan & P. L. Harrison
do not guarantee the presence of academic deficits, but they (Eds.), Contemporary intellectual assessment: Theories, tests,
do raise the risk that academic deficits will be observed, and issues (3rd ed., pp. 643 – 669). New York, NY: Guilford
sometimes by a lot. Press.
INTERNATIONAL JOURNAL OF SCHOOL & EDUCATIONAL PSYCHOLOGY 145

Flanagan, D. P., & McGrew, K. S. (1997). A cross-battery ment: Theories, tests, and issues (2nd ed., pp. 136 – 182). New
approach lo assessing and interpreting cognitive abilities: York, NY: Guilford Press.
Narrowing the gap between practice and cognitive science. McGrew, K. S. (2009). Editorial: CHC theory and the human
In D. P. Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), cognitive abilities project: Standing on the shoulders of the giants
Contemporary intellectual assessment: Theories, tests, and of psychometric intelligence research. Intelligence, 37, 1–10.
issues (pp. 314– 325). New York, NY: Guilford Press. Ortiz, S. O., Flanagan, D. P., & Alfonso, V. C. (2015). Cross-
Flanagan, D. P., & Ortiz, S. O. (2001). Essentials of cross-battery battery assessment software system X-BASS (Version 1.2)
assessment. New York, NY: Wiley. [software]. Hoboken, NJ: Wiley. Retrieved from www.wiley.
Flanagan, D. P., Ortiz, S. O., & Alfonso, V. C. (2007). Essentials com/go/XBASS
of cross-battery assessment (2nd ed.). New York, NY: Wiley. Pearson, K. (1930). The life letters and labours of Francis
Flanagan, D. P., Ortiz, S. O., & Alfonso, V. C. (2013). Essentials Galton: Vol. 3. Researches of middle life. Cambridge, England:
of cross-battery assessment (3rd ed.). Hoboken, NJ: Wiley. Cambridge University Press.
Schneider, W. J. (2013). What if we took our models seriously?
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Mascolo,
Estimating latent scores for individuals. Journal of Psychoe-
J. (2002). The Achievement Test Desk Reference (ATDR):
ducational Assessment, 31, 186– 201.
Comprehensive assessment and learning disabilities. Boston,
Schneider, J. W., & McGrew, K. S. (2012). The Cattell-Horn-
MA: Allyn & Bacon.
Carroll model of intelligence. In D. P. Flanagan &
Flanagan, D. P., Ortiz, S. O., Alfonso, V. C., & Mascolo, J. T. P. L. Harrison (Eds.), Contemporary intellectual assessment:
(2006). Achievement test desk reference: A guide to Theories, tests, and issues (3rd ed., pp. 99 –144). New York,
learning disability identification (2nd ed.). New York, NY: NY: Guilford Press.
Wiley. Stuebing, K. K., Fletcher, J. M., Branum-Martin, L., & Francis,
Kranzler, J. H., Floyd, R. G., Benson, N., Zaboski, B., & D. J. (2012). Evaluation of the technical adequacy of three
Thibodaux, L. (this issue). Classification agreement analysis methods for identifying specific learning disabilities based
of Cross-Battery Assessment in the identification of specific on cognitive discrepancies. School Psychology Review, 41,
learning disorders in children and youth. International 3 – 22.
Journal of School & Educational Psychology, 4(3), 124 – 136. Wagner, R. K., & Torgesen, J. K. (1987). The nature of
McGrew, K. S. (2005). The Cattell-Horn-Carroll theory of phonological processing and its causal role in the
cognitive abilities: Past, present, and future. In D. P. Flanagan acquisition of reading skills. Psychological Bulletin,
& P. L. Harrison (Eds.), Contemporary intellectual assess- 101(2), 192 – 212.

View publication stats

You might also like