Validity

Educational and Psychological Measurement and
Evaluation
TEST VALIDITY
CONTENT
Messick (1989) unified view of validity
 Content Validity
 Construct Validity
 Criterion Validity
 Consequential Validity
Factors affecting validity
 Instructional Procedures
 Test Administration & Scoring Procedures
 Test Itself
 Student Characteristics
 Functioning Content & Teaching Procedure
Relationship between Validity and reliability
INTRODUCTION
 Classroom assessment consisit of determining purpose
and learning target related standards, systematically
obtaining information from students, interpretening the
information collected, and using the information.
 There are several criteria that determine the quality of
the classroom assessment.
 Validity is one of the most important criteria for ensuring
high quality classroom assessment .
What is validity?
 Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support
the adequacy and appropriateness of interpretations and
actions based on test scores or other modes of assessment
(Messick, 1993).
 Validity is the most important type of quality control

procedure for measuring assessment (Thorndike, 1997).
 The validity of a test is the extent to which it measures what it

is design to measure.
(Messick, 1993)
 Validity is used to determine whether the test measure what
it claims to measure.
 Validity is concerned with whether the information being

gathered is relevant to the decision that needs to be made.
 If a test is not valid or does not measure what it claims to

measure, then the test has no meaningful use.
(Steven, 2005)
Types of validity
Content Construct Criterion Consequential

validity validity validity Validity
Concurrent Predivtive
Validity Validity
Content Validity
 The extent to which the assessment is representative of the domain on
interest.
 Content validity usually is based on wheter the assessment
appropriately measures or samples the domain of instructional
material presented to the student.
 Example :
• Suppose you wanted to test for everything that Sixth-grade
students learn in a four week unit about insects. But it will take a
long time for student to complete the test.
• The solution is, select a sample of what has been taught to assess
and then you use student’s achievement on this sample to make
inferences about knowledge of the entire domain of content,
reasoning and other objectives.
• For example, if student correctly answer 85% of the item, it means
that the student knows 85% of the content in the entire unit.
• If your sample is judge to be representative of the domain, then
you have the content-related evidence for validity.
(Steven, 2005)
How to determine an adequate sampling?
 Using a test blue print or table of specification to further delineate what

objectives you intend to assess and what important from the content
domain.
 Table of spesification is a two way grid that shows the content and types of
learning targets represented in your assessment.
Bloom’s Taxonomy
Major Content Understand
Application
Remember
Evaluate
Analysis
Total
Create
Areas
Topic 1
Topic 2
.......
Total No of item
Total no of item / % No/(%) No/(%) No/(%) No/(%) No/(%) No/(%)
/100%
(McMillan, 2007)
Construct Validity
 The extent to which the assessment is a meaningful measure of
an unobservable trait or characteristic such as honesty, attitude
and intelligence.
 Example :
 Let’s say you want to create an assessment that measures the
construct of a job stress. The score of the job stress assessment
should relate to various prediction indicated by theories about
job stress.
 For example, job stress test theories indicate a relationship
between job stress and variables of absenteeism, job turnover,
or peer rating job stress.
 If the job stress test is able to predict these variables, then the
test is considered to have construct validity.
(McMillan, 2007)
Criterion Validity
 The relationship between an assessment and another
measure of the same trait.
 With this type of of validity an assessment is given to a
group of students. Their assessment scores are then
compared to their scores on an external criterion
appropriately related to what the test claims to measure.
 Criterion-related validity may be measured by comparing
the student’s score on the achievement with an older/
already established achievement test.
 If the new test score are closely related to the established
test, then the new test is considered valid.
(McMillan, 2007)
Type of criterion validity :
1) Concurrent Validity
 When the criterion is something that is happening or

being assessed at the same time as the construct of
interest, it is called concurrent validity.
 Example :
Researcher administers a self-esteem inventory to a
group of 8th graders and compares their scores on it with
their teacher’s ratings of students’ self-esteem obtained
at about the same time.
Type of criterion validity :
2) Predictive Validity
Obtain predictive evidence of validity by measuring your

participants at one point in time on your test and then, at a
future time, measuring them on the criterion measure.
Take more time and effort than concurrent evidence, but it
can provide superior evidence that your test does what you
want it to do.
For example, if the trial results in math scores have a strong
relationship with SPM grade math, then math trial test is said
to have predictive validity of higher. This significant
achievement in experimental test can predict the results of
SPM.
Consequential
Validity
 Messick (1989) originally introduced consequences to the
validity argument. Later, Shepard (1997) broadened the
definition by arguing one must investigate both
positive/negative and intended/unintended consequences of
score-based inferences to properly evaluate the validity of the
assessment system.
 The consequential aspect of validation assesses the values of
score interpretations as they relate to bias, fairness, and social
consequences.
 Consequential validity describes the intended and unintended
effects of assessment on instruction or teaching and student
learning.
 Discussions of the consequences of assessments usually
distinguish between intended positive consequences and
unintended negative consequences.
(Sax, 1997)
 Although both types of consequences are relevant to the
evaluation of the validity of an assessment program, what is
a positive effect for one observer may be a negative effect
for someone else.
 Example : Narrowing the curriculum
 Narrowing may be viewed as a positive outcome by
those who want instruction to have a sharper focus on
the material in the state content standards, but it may be
viewed as a negative outcome for those that worry that
important concepts and content areas (e.g., history or
music) may be short changed.
 Because there is not universal agreement about whether
particular outcomes are positive or negative, the discussion
below is not separated into intended positive and
unintended negative consequences.
Intended Consequences
Lane and Stone (2002) suggest that assessments are intended
to impact:
◦ Student, teacher, and administrator motivation and effort;
◦ Curriculum and instructional content and strategies;
◦ Content and format of classroom assessments;
◦ Improved learning for all students;
◦ Professional development support;
◦ Use and nature of test preparation activities; and
◦ Student, teacher, administrator, and public awareness and
beliefs about the assessment, criteria for judging
performance, and the use of assessment results.
Unintended Consequences
At times, however, Lane and Stone (2002) propose unintended
consequences are possible such as:
◦ Narrowing of curriculum and instruction to focus only on the
specific learning outcomes assessed;
◦ Use of test preparation materials that are closely linked to the
assessment without making changes to the curriculum and
instruction;
◦ Use of unethical test preparation materials; and
◦ Inappropriate use of test scores by administrators.
FACTORS OF VALIDITY
Instructional Procedures
Test Administration &

Scoring Procedures
Test Itself
Student Characteristics
Functioning Content & Teaching
Procedure
Instructional
Procedures
With educational tests, in addition to the
content of the test influencing validity, the
way the material is presented can influence
validity.
For example : consider a test of critical

thinking skills. If the students were coached
and given solutions to the particular
problems included on a test, validity would
be compromised. This is potential problem
when teachers “teach the test”
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Test Administration &
Scoring Procedures
In terms of
Deviation from standard administration, failure to
administrative and provide the appropriate
scoring procedures can instruction or follow strict
undermine validity. time limits can lower
validity.
Whether it is a teacher
made test or standardized
test, adverse physical and
psychological conditions
during testing time may
affect the validity.
(2nd ed.).
The test administration and scoring procedure may
also affect the validity of the interpretations from the
results. For instance, in teacher-made tests factors like
insufficient time to complete the test, unfair help to
individual students, cheating during the examination,
and the unreliable scoring of essay answers might lead
to lower the validity.
Similarly in standardized tests the lack of following

standard directions and time limits, unauthorized help
to students and errors in scoring, would tend to lower
the validity.
(2nd ed.).
Test Itself
Length of the test
• A test usually represents a sample of many questions. If
the test is too short to become a representative one, then
validity will be affected accordingly. Homogeneous
lengthening of a test increases both validity and
reliability.
Unclear direction
• If directions regarding how to respond to the items,

whether it is permissible to guess and how to record the
answers, are not clear to the pupil, then the validity will
tend to reduce.
(2nd ed.).
Reading vocabulary and sentence structures which
are too difficult
• The complicated vocabulary and sentence structure
meant for the pupils taking the test may fail in
measuring the aspects of pupil performance; thus
lowering the validity.
Inappropriate level of difficulty of the test items

• When the test items have an inappropriate level of
difficulty, it will affect the validity of the tool. For
example, in criterion-referenced tests, failure to match
the difficulty specified by the learning outcome will
lower the validity.
(2nd ed.).
Poorly constructed test items
• The test items which provide unintentional clues to the

answer will tend to measure the pupils’ alertness in
detecting clues as well as the aspects of pupil
performance which ultimately affect the validity.
Ambiguity
• Ambiguity in statements in the test items leads to
misinterpretation, differing interpretation and confusion.
Sometimes it may confuse the better students more than
the poorer ones resulting in the discrimination of items
in a negative direction. As a consequence, the validity of
the test is lowered.
(2nd ed.).
Student Characteristics
There are certain personal factors which influence the pupils
response to the test situation and invalidate the test
interpretation. The emotionally disturbed students, lack of
students’ motivation and students’ being afraid of test situation
may not respond normally and this may ultimately affect the
validity.
Response set also influences For example : if an

the test results. It is the test examinee experiences high
taking habit which affects levels of test anxiety or is
the pupils score. If a test is not motivated to put forth
used again and again its a reasonable effort, the
validity may be reduced. result may be distorted.
(2nd ed.).
Functioning Content &
Teaching Procedure
In achievement testing, the functioning content of test
items cannot be determined only by examining the
form and content of the test. The teacher has to teach
fully how to solve a particular problem before
including it in the test.
Test of complex learning outcomes seem to be valid if

the test items function as intended. If the students have
previous experience of the solution of the problem
included in the test, then such tests are no more a valid
instrument for measuring the more complex mental
processes and they thus, affect the validity.
(2nd ed.).
RELATIONSHIP BETWEEN
RELIABILITY AND
VALIDITY
 Validity and reliability are closely related.
 A test cannot be considered valid unless the
measurements resulting from it are reliable.
 Likewise, results from a test can be reliable and
not necessarily valid.
(Messick, 1989)
A test cannot be valid unless it is reliable.
• If a test does not measure something consistently, it follows that
it cannot always be measuring it accurately.
It is quite possible for a test to be reliable but invalid.
• A test can consistently give the same results, although it is not
measuring what it is supposes to.
(Alderson, 1995)
Reliability And Validity
if then
Unreliable Test validity is undermined
Reliable but not valid Test is not useful
Unreliable and invalid Test is definitely not useful
Reliable and valid Test can be used with good result

ACTIVITY
STATEMENT TRUE / FALSE
Validity is used to determine whether the test measure what it

claims to measure
Criterion validity is the extent to which the assessment is

representative of the domain on interest.
Content validity is the extent to which the assessment is a

meaningful measure of an unobservable trait or characteristic such
as honesty, attitude and intelligence.
A test that is being assessed at the same time as the construct of

interest, it is called concurrent validity.
The complicated vocabulary and sentence structure meant for the
pupils taking the test may fail in measuring the aspects of pupil
performance, thus lowering the validity is the factor of teaching
procedures.
STATEMENT TRUE / FALSE
If unreliable and invalid then test is definitely not useful.
A test cannot be considered valid unless the measurements resulting

from it are reliable.
If unreliable then test is not useful.
The test administration and scoring procedure may also affect the
validity of the interpretations from the results.
Q&A
Question :
Can a test be valid if it is not reliable?
Answer :
No, a test can not be valid if it is not reliable.
If a test does not measure something consistently, it follows that it
cannot always be measuring it accurately.
Question : If it possible that the examination has high reliability
but a low validity ? How can we avoid this kind of situation?
Answer :
Yes, it is possible. To avoid this kind of situation, you have to
look back at the factors that affecting the validity.
Question :
What's the difference between validity and reliability?
Answer :
Reliability refers to how consistent data is. If a method is
reliable, results will be consistent so the measure will
produce the same results on different occasions.
Validity refers to the ability to measure what was set out to
be measured. A test is valid when it measures what it
intends to measure.
Reference
Airasian, P. W. & Russel, M. K. (2008). Classroom Assessment: Concept and Applications (6th ed.). New York, US :
McGraw-Hill Publication.
Alderson, J. C., Clapham, C. and Wall, D. (1995). Language Test Construction and Evaluation. Cambridge :
Cambridge University Press.
McMillan, J.H. (2007). Classroom Assessment (4th ed.). Boston, MA : Pearson Publication.
Messick, S. (1993). Foundations of validity: Meaning and Consequences in Psychological Assessment. Princeton,
NJ : ETS Publication
Messick, S. (1989). Educational Measurement (3rd ed.). New York, US : American Council on Education and
Macmillin
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education (2nd ed.).
Upper Saddle River, NJ : Pearson Publication.
Sax, G. (1997). Principles of Educational and Psychological Measurement and Evaluation. Belmont, CA :
Wadsworth Publisher.
Shepard, L.A. (1997). The Centrality of Test Use and Consequences for Test Validity
Steven, R. B. (2005). Classroom Assessment: Issues and Practices. Boston, MA : Pearson Publication.

Validity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity

Uploaded by

Copyright:

Available Formats

Educational and Psychological Measurement and

 Validity is the most important type of quality control

 The validity of a test is the extent to which it measures what it

 Validity is concerned with whether the information being

 If a test is not valid or does not measure what it claims to

Content Construct Criterion Consequential

 Using a test blue print or table of specification to further delineate what

 When the criterion is something that is happening or

Obtain predictive evidence of validity by measuring your

Test Administration &

For example : consider a test of critical

Similarly in standardized tests the lack of following

• If directions regarding how to respond to the items,

Inappropriate level of difficulty of the test items

• The test items which provide unintentional clues to the

Response set also influences For example : if an

Test of complex learning outcomes seem to be valid if

Unreliable Test validity is undermined

Reliable but not valid Test is not useful

Unreliable and invalid Test is definitely not useful

Reliable and valid Test can be used with good result

Validity is used to determine whether the test measure what it

Criterion validity is the extent to which the assessment is

Content validity is the extent to which the assessment is a

A test that is being assessed at the same time as the construct of

If unreliable and invalid then test is definitely not useful.

A test cannot be considered valid unless the measurements resulting

If unreliable then test is not useful.

You might also like