You are on page 1of 8

1

Level of Appropriateness in Test Construction

UPHL Institutional Journal


Volume 1, Issue 1, March 2003, pp. 48-55

The Profile of Teacher-Made Test Construction of the


Professors of University of Perpetual Help Laguna

Carlo Magno
De La Salle University

Abstract
This research determined the profile of the professors’ level of appropriateness in test
construction in University of Perpetual Help Laguna. It was noted that the practical
rules of test item construction is formulated on the basis of years of experience in
preparing items and empirical evaluation of responses. Thus, the level of
appropriateness in test construction correlated with the actual years of experience in
teaching was investigated. There were 33 college professors who were surveyed in
this study. The survey determined their tendency to follow general principles,
guidelines and procedure in test construction. The results indicate that majority of the
professors (54.54%) had an average level of appropriateness in test construction with
a mean value of 24.76. The result also revealed that there is no significant
relationship between the level of appropriateness in test construction and the actual
years of teaching experience. This is due to the adequate training and exposure that
new professors had in test construction.

The process of assessment and evaluation is always part and parcel in the teaching and
learning process in every educational institution whether in basic education or in tertiary
education. In educational testing the procedure is an extensive measurement process called
evaluation. It designates a summing-up process in which value judgment plays a large part in
the process, as in grading and promoting students (Hopkins, 1990). Particularly, professors
within the semester construct their own test which is called a teacher-made test. Teacher made
tests are more specifically focused and they usually reflect the content of a particular unit or
course (Hopkins, 1990). The teacher made-test is tailored to measure the achievement of
students and intended objectives for them after completing a series of learning tasks for the
course. Hopkins (1990) noted that evaluation is conducted in the educational setting for the
function of instruction, administrative and guidance. According to Anastasi (1988) that
undoubtedly the largest number of tests covering the content of specific courses are prepared
by instructors for use in their own classroom. Most schools focus on student progress as the
ultimate criterion, therefore, it is important to evaluate the status of pupils expertly. It was
indicated by Anastasi (1988) that the preparation of local classroom tests can be substantially
improved by an application of the techniques and accumulated experience of professional
teachers. It is necessary that teachers in the preparation of their own classroom test follow
specific guidelines to ensure quality test output and attaining the goal of measuring effectively
what the students have learned. Anastasi (1988) has indicated three major steps in classroom
test construction: (1) planning the test, (2) item writing, and (3) item analysis. Moreover,
Anastasi (1988) cautioned that test constructors who plunges directly into item writing is likely
to produce a lopsided test. It means that some areas will be over-represented while others may
remain untouched. The test constructed without a blueprint is likely to be overloaded with
2
Level of Appropriateness in Test Construction

impermanent and less important material.


To avoid any lopsidedness and overrepresentations of tests it necessitates test
constructors to go back into the statement of objectives made for the term. The objectives will
serve as the direction on what specific task is to be measured. To guard any fortuitous
imbalances and disproportionate item distribution, test constructors draws up a table of
specifications before any items are prepared. Such specifications should begin with an outline
of both the instructional objectives of the course, the subject matter to be covered, and the
cognitive skills measured – a three-way grid (Gronlund, 1990). In Education today, teachers
emphasize more on developing the higher order thinking skills. This implies that tests item
should be concentrated more on the higher order thinking skills of students which includes
application, analysis, synthesis and evaluation. Before writing the items the table of
specifications should at least be reviewed by experts. In the educational setting it is usually
reviewed by immediate superiors such as coordinators, department chairpersons, assistant
principals, principals and other depending on the organizational structure of the school.
Once the table of specifications is approved, the items are then constructed following
specific guidelines. The aim in writing items is towards clarity and precision. The constructed
items are then content validated by experts to improve the item construction and whether it is
representative of the course content. In the process of content validation the test items and
instructions are revised according grammar, structure, form and format. The items are more
refined upon the review of experts. The test constructor then revises the items for clarity and
precision.
In some cases the test items are pilot tested to ensure appropriateness in the calibration
in terms of the testing condition and then revised again for improvement.
After administering the test, item analysis is conducted. In item analysis, the test
difficulty and index of discrimination is determined. Item difficulty determines how difficult or
easy an item is and index discrimination determines how good or bad an item is. In usual cases,
the poor items are removed from the pool of items and the test is again administered. The
result of the evaluation done may affect the objectives and the instruction.
It was indicated in the studies of Hopkins and Stanley (1981) that it is commonly easier
to prepare items that require recall of simple facts than integration of different facts and
application of principles to new situations. Follow-up studies have shown that the factual
details learned in a course are most likely to be forgotten, while the understanding of principles
and their application to new situations show either no retention loss or an actual gain with time
after completing the course.
One study by Mayo (1967) found that graduating seniors in 86 teacher - training
institutions did not demonstrate a very high level of measurement competence.
A study on the social consequences of testing and the development of talent by Goslin
(1967) found that about 60% of all teachers had only minimal exposure to training in test and
measurement techniques. The unsatisfactory quality of the majority of teacher-made tests no
doubt reflects this inadequacy in training. Moreover, it was found that teachers who have had
little preparation in tests and measurement tend to make little use of the information obtained
from standardized tests.
A survey was undertaken by Lazo, Vasquez – de Jesus, and Edralin-Tiglao (1976) with
respondents from three sectors namely, the clinical, educational, and industrial testing. The
study described psychometric practices in terms of a number of selected variables that includes:
3
Level of Appropriateness in Test Construction

Uses of tests, the usefulness of ratings of such tests, the purpose for which the measurement
was undertaken, and the tests most commonly employed. The results indicate that most tests
are used in the educational setting. It is usually used for admission, academic programs,
vocational guidance, and grading and promotion to the next grade level.
In a study by Lazo (1977) about the psychological testing in schools, it was found that
tests are used as basis for decision making in about 65% of the educational institutions
surveyed. In terms of the sheer number of testing, schools outnumbered clinics and industrial
firms in terms of usage.
With the given background in educational testing and studies shown this paper aims to
determine the profile of professors in their level of appropriateness in their test construction
practices stratified into different departments of the university.
It was indicated by Anastasi (1988) that the practical rules of test item construction is
formulated on the basis of years of experience in preparing items and empirical evaluation of
responses. In response to this assumption, this paper also aims to investigate the relationship
between the professors’ years of teaching experience to their level of appropriateness in test
construction.
The level of appropriateness in test construction is operationally defined as the way in
which professors, teachers, item writers follow the necessary guidelines, principles and
procedure in test construction.

Method
Respondents

There were 33 professors in the University of Perpetual Help of Laguna who


participated in the survey. The 33 professors belong in different colleges and departments. The
different colleges were represented from the Arts and Sciences, Allied Health and Science,
Education, and Engineering. Table 1 shows the number of respondents stratified into different
colleges and departments.

Table 1. Number of Respondents by College and Department

College of Arts and Sciences


English 7
Social Sciences 2
Political Science 1
Science 2
Math 2
Psychology 3
College of Allied and Health Sciences
Nursing 4
Medical Technology 4
Midwifery 3
College of Education 2
College of Engineering
4
Level of Appropriateness in Test Construction

Computer Engineering 1
Electronics and Communications 2
Engineering
TOTAL 33
Instrument

A survey instrument was constructed to identify the level of appropriateness in test


construction of the professors. The survey also required the respondents to indicate the
college, department and years of teaching of the professors. The items on the level of
appropriateness in test construction include the general principles, guidelines and procedure in
test construction. There are a total of 11 items comprising 6 positive items and 5 negative
items scored in reverse order. The survey has a three-point numeric scale enabling the
respondents to answer “Yes,” “Sometime,” and “No” for each item. For the positive items the
“Yes” response is equivalent to 3 points, the “Sometimes” 2 points, and “No” 1 point. The
scoring is reversed for the negative items. The summation of the scores represents the index of
level of appropriateness in test construction.
In the process of content validation, the items were reviewed by the Deans of
Education in two different colleges and 2 professors who specializes in test construction in one
university.

Design and Procedure

The design of this study is cross-sectional where the respondents of different years of
teaching were studied and measured at the same point in time. Before starting the session on
the seminar about assessment and evaluation of learning, the speaker administered the survey
to the participants. There were 33 professors who attended the session. The procedure on how
to answer the survey was first explained to the respondents. After 13 to 14 minutes the
procedure on scoring the survey was explained and the participants scored their responses. The
participants were then debriefed about the purpose of the study.
The data was organized by college and department. The scores for the level of
appropriateness in test construction were tabulated together with the corresponding number of
teaching experiences of the respondents. The 2 sets of data were then correlated using the
Pearson’s r moment correlation coefficient.
To determine the profile of the level of appropriateness in test construction of the
professors a frequency distribution table was constructed. The responses for each item was
also tallied and reported in percentage.
5
Level of Appropriateness in Test Construction

Result

Profile of the Professors Years of Teaching

Table 2 shows the frequency distribution of professors in their years of teaching


experience.

Table 2. Age distribution of Professors’ Years of Teaching Experience

Years of Teaching Frequency Percentage


31 – 33 1 3.03
28 – 30 2 6.06
25 – 27 1 3.03
22 – 24 1 3.03
19 – 21 1 3.03
16 – 18 1 3.03
13 – 15 2 6.06
10 – 12 2 6.06
7–9 5 15.15
4–6 8 24.24
1–3 9 27.27

The table shows that most of the professors (27.27%) of University of Perpetual Help
has teaching experiences within 1 to 6 years. There are quite few professors who have taught
more than 6 years. This may be due to some reasons that teachers who have stayed long has
retired already and quite a large number of professors came in to the university to teach.
Majority of the professors are within 1 to 6 years of teaching since they are still starting their
teaching career.
The mean in the years of teaching of professors is 9.92 years and a standard deviation
9.05. The years of teaching is widely dispersed.

Profile of the Professors Level of Appropriateness in Test Construction


Table 3 shows the profile of the level of appropriateness of test construction of
professors.
Table 3. Profile of the Level of Appropriateness in Test Construction

Scores Frequency Percentage Remark


31 – 35 1 3.03 Very High level of appropriateness in test
construction
6
Level of Appropriateness in Test Construction

26 – 30 11 33.33 High level of appropriateness in test construction


21 – 25 18 54.54 Average level of appropriateness in test
construction
16 – 20 1 3.03 Low level of appropriateness in test construction
11 – 15 2 6.06 Very Low level of appropriateness in test
construction
33
The table shows that most (54.54%) of the professors of University of Perpetual Help
are within the average level of appropriateness in test construction. This means that 54.54% of
the professors tends and tries to follow the general principles, guidelines and procedure in test
construction but there are times accounted for where they tend to neglect the basic principles.
There is also a high percentage in the high level of appropriateness in test construction which is
33.33% of the sample. This means that 33.33% of the sample tends to follow adequately the
necessary principles, guidelines, and procedure in test construction.
The results indicate that the mean level of appropriateness in test construction of the 33
professors is 24.76 with a standard deviation of 3.81.

Table 4 shows the percentage of responses for each item for the level of
appropriateness in test construction.

Table 4. Percentage of Responses for each Item

Yes Sometimes No
1. I match the objectives from the syllabus with the test items. 84.85% 12.12% 3.03%
2. I construct a table of specifications. 72.73% 21.21% 6.06%
3. I just keep my table of specifications for myself. 12.12% 27.27% 60.61%
4. I make sure that students will be tested on terms and 84.85% 6.06% 6.06%
concepts.
5. I show my test items to the department chair or dean. 75.76% 21.21% 3.03%
6. I put items more on the higher cognitive skills. 60.61% 21.21% 15.15%
7. I usually give more essay questions. 15.15% 57.58% 24.24%
8. I inform my students how they are graded in an essay. 72.73% 18.18% 9.09%
9. I compute for the item difficulty and index discrimination of 39.39% 39.39% 21.21%
the test scores.
10. I test more on the content. 45.45% 36.36% 18.18%
11. I make use of only one test type. 0% 12.12% 87.88%

The table shows that most of the respondents (84.85%) matches their test items with
their syllabus. Majority of them (72.73%) also constructs a table of specifications in
preparation for test construction, although there is quite some number (21.21%) who only
makes for sometimes. Most of them (60.61%) just keep the table of specifications and not
shown to their department chairs. Most of them (84.85%) concentrate more on testing terms
and concepts. There is also a good number (75.76%) where they practice showing their test
items to their department chair or dean. It is also a good indicator that most of them (60.61%)
7
Level of Appropriateness in Test Construction

make items measuring the higher order thinking skills of students. Most of them at times make
purely essay questions for tests (57.58%). It is also a good indicator that most of them
(72.73%) informs their students how they are graded in an essay. There is a great quantity of
spread when it comes to conducting an item analysis for their test since it is not really required
to come up with one. Most of them (45.45%) make test measuring more on the content rather
than skills. Majority of them (87.88) make their test with a variety of test types.

Relationship Between Level of Appropriateness in Test Construction and Years of


Teaching

The professors’ level of appropriateness in test construction was correlated with their
total number of teaching experiences. The Pearson’s r computed indicates a 0.27 positive
correlation. This means that as the years of teaching experience increases, the level of
appropriateness in test construction also improves. This is explained by the experiences that
professors had accumulated in test construction contributing to having followed more
appropriate test construction procedures and improvement of their work across the years of
practice. The coefficient of 0.27 indicates a negligible relationship between level of
appropriateness in test construction and years of teaching experience.
Since the r computed (0.27) is lesser than the r critical (0.333), there is no significant
relationship between the level of appropriateness in test construction and years of teaching
experience. Some teachers even with a few years of teaching experience may have gained the
necessary and adequate skills and knowledge in appropriate test construction guidelines,
procedures and principles.

Discussion

One of the problems noted Goslin (1967) in the literature review is the weak
foundation of teachers in the area of constructing test for classroom use. However, it can be
viewed also that given the proper training, teachers having short teaching experiences can also
perform well in the area of test construction. This may be due that in these days teachers attend
seminars, workshops, symposium in test construction. The course in test construction and
similar courses are offered in the undergraduate and graduate level in education and other
social sciences. The increased number of references for appropriate test construction is
available, it has increased and easily obtained. The department chairs and deans may have set
the necessary procedure within the institution for teachers actualize the principles of test
construction. Some teachers with long years of teaching experience may have also gained the
appropriate skills and knowledge in test construction but there might be some who were not
exposed to the necessary principles.
It can also be explained basing on Lazo’s (1977) result that since most schools today
are sheer on the use of testing, professors might have had the necessary training and exposure
due to the increased practice and use of tests in the educational setting. Since there is a
growing number in the uses of tests and teachers may have felt the need to acquire and practice
the procedure necessary.
8
Level of Appropriateness in Test Construction

References

Anastasi, A. (1988). Psychological testing (6th ed.). New York: MacMillan Publishing
Company.

Goslin, R, (1967). Teacher made test training. Psychological Bulletin, 77, 409 – 420.

Gronlund, N. (1990). Measurement and evaluation in teaching (6th ed.). New York:
MacMillan Publishing Company.

Hopkins, K. D. (1990). Educational and psychological measurement (7th ed.). Englewood


cliffs, NJ: Prentice Hall.

Hopkins, K. D. & Stanley, J. C. (1981). Educational and psychological measurement and


evaluation (6th ed.). Englewood cliffs, NJ: Prentice Hall.

Lazo, L. S. (1977). Psychological testing in schools: An assessment. Philippine Journal of


Psychology, 10, 23 – 27.

Lazo, L. S., Vasquez-de Jesus, L., & Edralin-Tiglao, R. (1976). A survey of psychological
measurement in the Philippines. Philippine Journal of Psychology, 12, 1 – 14.

Mayo, R. (1967). Measurement competency of high school seniors. Psychological Reports,


12, 23 – 29.

You might also like