Professional Documents
Culture Documents
Evaluation
TEST VALIDITY
CONTENT
Messick (1989) unified view of validity
Content Validity
Construct Validity
Criterion Validity
Consequential Validity
Factors affecting validity
Instructional Procedures
Test Administration & Scoring Procedures
Test Itself
Student Characteristics
Functioning Content & Teaching Procedure
Relationship between Validity and reliability
INTRODUCTION
Classroom assessment consisit of determining purpose
and learning target related standards, systematically
obtaining information from students, interpretening the
information collected, and using the information.
There are several criteria that determine the quality of
the classroom assessment.
Validity is one of the most important criteria for ensuring
high quality classroom assessment .
What is validity?
Validity is an integrated evaluative judgment of the degree to
which empirical evidence and theoretical rationales support
the adequacy and appropriateness of interpretations and
actions based on test scores or other modes of assessment
(Messick, 1993).
(Messick, 1993)
Validity is used to determine whether the test measure what
it claims to measure.
(Steven, 2005)
Types of validity
Concurrent Predivtive
Validity Validity
Content Validity
The extent to which the assessment is representative of the domain on
interest.
Content validity usually is based on wheter the assessment
appropriately measures or samples the domain of instructional
material presented to the student.
Example :
• Suppose you wanted to test for everything that Sixth-grade
students learn in a four week unit about insects. But it will take a
long time for student to complete the test.
• The solution is, select a sample of what has been taught to assess
and then you use student’s achievement on this sample to make
inferences about knowledge of the entire domain of content,
reasoning and other objectives.
• For example, if student correctly answer 85% of the item, it means
that the student knows 85% of the content in the entire unit.
• If your sample is judge to be representative of the domain, then
you have the content-related evidence for validity.
(Steven, 2005)
How to determine an adequate sampling?
Bloom’s Taxonomy
Major Content Understand
Application
Remember
Evaluate
Analysis
Total
Create
Areas
Topic 1
Topic 2
.......
Total No of item
Total no of item / % No/(%) No/(%) No/(%) No/(%) No/(%) No/(%)
/100%
(McMillan, 2007)
Construct Validity
The extent to which the assessment is a meaningful measure of
an unobservable trait or characteristic such as honesty, attitude
and intelligence.
Example :
Let’s say you want to create an assessment that measures the
construct of a job stress. The score of the job stress assessment
should relate to various prediction indicated by theories about
job stress.
For example, job stress test theories indicate a relationship
between job stress and variables of absenteeism, job turnover,
or peer rating job stress.
If the job stress test is able to predict these variables, then the
test is considered to have construct validity.
(McMillan, 2007)
Criterion Validity
The relationship between an assessment and another
measure of the same trait.
With this type of of validity an assessment is given to a
group of students. Their assessment scores are then
compared to their scores on an external criterion
appropriately related to what the test claims to measure.
Criterion-related validity may be measured by comparing
the student’s score on the achievement with an older/
already established achievement test.
If the new test score are closely related to the established
test, then the new test is considered valid.
(McMillan, 2007)
Type of criterion validity :
1) Concurrent Validity
Example :
Researcher administers a self-esteem inventory to a
group of 8th graders and compares their scores on it with
their teacher’s ratings of students’ self-esteem obtained
at about the same time.
Type of criterion validity :
2) Predictive Validity
(Sax, 1997)
Although both types of consequences are relevant to the
evaluation of the validity of an assessment program, what is
a positive effect for one observer may be a negative effect
for someone else.
Example : Narrowing the curriculum
Narrowing may be viewed as a positive outcome by
those who want instruction to have a sharper focus on
the material in the state content standards, but it may be
viewed as a negative outcome for those that worry that
important concepts and content areas (e.g., history or
music) may be short changed.
Because there is not universal agreement about whether
particular outcomes are positive or negative, the discussion
below is not separated into intended positive and
unintended negative consequences.
Intended Consequences
Lane and Stone (2002) suggest that assessments are intended
to impact:
◦ Student, teacher, and administrator motivation and effort;
◦ Curriculum and instructional content and strategies;
◦ Content and format of classroom assessments;
◦ Improved learning for all students;
◦ Professional development support;
◦ Use and nature of test preparation activities; and
◦ Student, teacher, administrator, and public awareness and
beliefs about the assessment, criteria for judging
performance, and the use of assessment results.
Unintended Consequences
At times, however, Lane and Stone (2002) propose unintended
consequences are possible such as:
◦ Narrowing of curriculum and instruction to focus only on the
specific learning outcomes assessed;
◦ Use of test preparation materials that are closely linked to the
assessment without making changes to the curriculum and
instruction;
◦ Use of unethical test preparation materials; and
◦ Inappropriate use of test scores by administrators.
FACTORS OF VALIDITY
Instructional Procedures
Test Itself
Student Characteristics
Functioning Content & Teaching
Procedure
Instructional
Procedures
With educational tests, in addition to the
content of the test influencing validity, the
way the material is presented can influence
validity.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Test Administration &
Scoring Procedures
In terms of
Deviation from standard administration, failure to
administrative and provide the appropriate
scoring procedures can instruction or follow strict
undermine validity. time limits can lower
validity.
Whether it is a teacher
made test or standardized
test, adverse physical and
psychological conditions
during testing time may
affect the validity.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
The test administration and scoring procedure may
also affect the validity of the interpretations from the
results. For instance, in teacher-made tests factors like
insufficient time to complete the test, unfair help to
individual students, cheating during the examination,
and the unreliable scoring of essay answers might lead
to lower the validity.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Test Itself
Length of the test
• A test usually represents a sample of many questions. If
the test is too short to become a representative one, then
validity will be affected accordingly. Homogeneous
lengthening of a test increases both validity and
reliability.
Unclear direction
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Reading vocabulary and sentence structures which
are too difficult
• The complicated vocabulary and sentence structure
meant for the pupils taking the test may fail in
measuring the aspects of pupil performance; thus
lowering the validity.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Poorly constructed test items
Ambiguity
• Ambiguity in statements in the test items leads to
misinterpretation, differing interpretation and confusion.
Sometimes it may confuse the better students more than
the poorer ones resulting in the discrimination of items
in a negative direction. As a consequence, the validity of
the test is lowered.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Student Characteristics
There are certain personal factors which influence the pupils
response to the test situation and invalidate the test
interpretation. The emotionally disturbed students, lack of
students’ motivation and students’ being afraid of test situation
may not respond normally and this may ultimately affect the
validity.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
Functioning Content &
Teaching Procedure
In achievement testing, the functioning content of test
items cannot be determined only by examining the
form and content of the test. The teacher has to teach
fully how to solve a particular problem before
including it in the test.
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education
(2nd ed.).
RELATIONSHIP BETWEEN
RELIABILITY AND
VALIDITY
Validity and reliability are closely related.
A test cannot be considered valid unless the
measurements resulting from it are reliable.
Likewise, results from a test can be reliable and
not necessarily valid.
(Messick, 1989)
A test cannot be valid unless it is reliable.
• If a test does not measure something consistently, it follows that
it cannot always be measuring it accurately.
It is quite possible for a test to be reliable but invalid.
• A test can consistently give the same results, although it is not
measuring what it is supposes to.
(Alderson, 1995)
Reliability And Validity
if then
The test administration and scoring procedure may also affect the
validity of the interpretations from the results.
Q&A
Question :
Can a test be valid if it is not reliable?
Answer :
No, a test can not be valid if it is not reliable.
If a test does not measure something consistently, it follows that it
cannot always be measuring it accurately.
Question : If it possible that the examination has high reliability
but a low validity ? How can we avoid this kind of situation?
Answer :
Yes, it is possible. To avoid this kind of situation, you have to
look back at the factors that affecting the validity.
Question :
What's the difference between validity and reliability?
Answer :
Reliability refers to how consistent data is. If a method is
reliable, results will be consistent so the measure will
produce the same results on different occasions.
Validity refers to the ability to measure what was set out to
be measured. A test is valid when it measures what it
intends to measure.
Reference
Airasian, P. W. & Russel, M. K. (2008). Classroom Assessment: Concept and Applications (6th ed.). New York, US :
McGraw-Hill Publication.
Alderson, J. C., Clapham, C. and Wall, D. (1995). Language Test Construction and Evaluation. Cambridge :
Cambridge University Press.
McMillan, J.H. (2007). Classroom Assessment (4th ed.). Boston, MA : Pearson Publication.
Messick, S. (1993). Foundations of validity: Meaning and Consequences in Psychological Assessment. Princeton,
NJ : ETS Publication
Messick, S. (1989). Educational Measurement (3rd ed.). New York, US : American Council on Education and
Macmillin
Reynolds, C. R., Livingstone, R. B., & Willson, V. (2009). Measurement and assessment in education (2nd ed.).
Upper Saddle River, NJ : Pearson Publication.
Sax, G. (1997). Principles of Educational and Psychological Measurement and Evaluation. Belmont, CA :
Wadsworth Publisher.
Shepard, L.A. (1997). The Centrality of Test Use and Consequences for Test Validity
Steven, R. B. (2005). Classroom Assessment: Issues and Practices. Boston, MA : Pearson Publication.