Professional Documents
Culture Documents
In Collaboration With
Authors:
Book:
1st Edition:
March, 2015
Composer:
Printers:
Quantity:
1000
Price:
150/-
TABLE OF CONTENTS
UNIT-1:..................................................................1
INTRODUCTION......................................................1
1.1
1.2
1.3
1.4
1.5
1.6
UNIT-2:................................................................50
JUDGING THE QUALITY OF THE TEST.........................50
2.1
2.2
2.3
2.4
2.5
UNIT-3:................................................................66
APPRAISING CLASSROOM TESTS (ITEMS ANALYSIS)......66
3.1
THE VALUE OF ITEM.................................................66
3.2
THE PROCEDURE/ PURPOSE OF ITEM ANALYSIS:................72
3.2
MAKING THE MOST OF EXAMS: PROCEDURES FOR ITEM
ANALYSIS:.......................................................................73
3.3
ITEM DIFFICULTY:....................................................91
3.4
THE INDEX OF DISCRIMINATION...................................93
UNIT-4:................................................................98
INTERPRETING THE TEST SCORES............................98
4.1
4.2
4.3
4.4
UNIT-5:..............................................................117
EVALUATING PRODUCT, PROCEDURES & PERFORMANCE
........................................................................117
5.1
5.2
5.3
5.4
5.5
UNIT-6:..............................................................148
PORTFOLIOS......................................................148
6.1
PURPOSE OF PORTFOLIOS:.........................................148
6.3
GUIDELINE AND STUDENTS ROLE IN SELECTION OF PORTFOLIO
ENTRIES AND SELF-EVALUATION:.........................................156
6.4
USING PORTFOLIOS IN INSTRUCTION AND COMMUNICATION:
160
6.5
POTENTIAL STRENGTH AND WEAKNESSES OF PORTFOLIOS: 163
6.6
EVALUATION OF PORTFOLIO:......................................169
UNIT-7:..............................................................171
BASIC CONCEPTS OF INFERENTIAL STATISTS.............171
7.1
7.2
7.3
7.4
7.5
7.6
7.7
UNIT-8:..............................................................191
SELECTED TESTS OF SIGNIFICANCE........................191
8.1
8.2
8.3
T-TEST:...............................................................191
CHI-SQUARE (X2):..................................................194
REGRESSION:........................................................199
FOREWORD
Knowledge is the main distinctive characteristic of human, due
to which hece of Allahys seems, by the gra ing a dream in the olden
dawas selected as vice-regent of Allah Almigthy. Man is superior to other
living beings, because he has the capability and potentiality to understand
as well as reason the consequences. Knowledge is obtained through the
continuous process of education. This process is usually a life long
process.
This is also a fact that education is such an activity which is bilateral and participatory. It cannot be accomplished with out the two
partners-teacher and student. This activity requires a transmitter and
areceiver. If any one of them is missing the exercise would remain
incomplete.
To compare however, the two the- teacher appears superior to
his pupils as he is the organiser and director of the teaching learning
process. That is why since times immemoriable, search for significant
teachers has ever been in progress and the same is still going on. No dobt
the countable good teachers are there, but they are not countless. There is
a need of producing a countless number od genuine
educators/prospective educators to contribute in this regard.
This objective in view the people at the helm of the affairs are
trying their best to bring desirable changes in the education system,
teacher education curriculum and teacher training programmes. The best
teacher, being a dream in the old days, is about to become a reality, if the
curse outlines and syllabi are properly dispensed, it is hoped that the
required lot of teachers would be made available. The future
educators/teachers are needed to well equiped in all skills not confining
only to academic learning ignoring ITC, current affairs and contemporary
issues.
These objectives in view, improvements in the system are being
carried out to achiev the goals. The new curricula, on which is based, this
book of mine is the result of long deliberations and brain stormings
ACKNOWLEDGEMENT
All praises and glory be to Allah Almighty Who bestowed upon
me His blessing to be able to produce my this book, named Teacher
Education in Pakistan. My humble gratitude and thanks are due for Him
with submission and heartiest admiration who guided me to the right
path. This all became possible only due to Allahs significance and
benevolence. The rays of the light of Omni-Present Allah always took me
out of the deep darkness of ignorance to the lightened path of knowledge,
spreading its reflection to the needy.
My thanks and gratitudes are due for my old student and now
my colleague Dr. ______ , who provided certain reference books and
substantive substances that were very much beneficial for the
compilation of my this book. In addition to that, I am extremely thankful
to my Composer Mr. Muhammad Nawaz Khan Abbasi, Peshawar who
provided step by step expertise views regarding printing and book
production process.
My thanks are also due for Dr. Muhammad Rauf, the ______of
the ______of IER University, Peshawar, who always took pains in
searching certain reference books for me. He always showed great
enthusiasm and pleasure in complying to any of my request regarding the
Bibliographies to make them available at the earliest.
Last but not the least, I am thankful to my family members who
cooperated with me and made all sort of requirements available to me
during the process of preparing the primary substance of this book. They
maitained a very calm and conducive environment to me during all this
period of compilation, otherwise this work would not have been possible
to have come to light.
Autho
r
UNIT-1:
INTRODUCTION
1.1
1.1.1
Evaluation:
2.
3.
4.
Approaches to Evaluation
Evaluation in our schools is essentially concerned with two
major approaches to making judgments:
1.
Product Evaluation:
Process Evaluation:
Curriculum Evaluation
Programme Evaluation
Personnel Evaluation
Institutional-Evaluation:
Instructional Objectives
Instructional Objectives
Instructional Objectives
Instructional Objectives
Feedback Loop
The model shows a fifth component, the Feedback Loop that
can be used by the teacher as both a management and a diagnostic
procedure. If the results of evaluation indicate that sufficient learning has
occurred, the loop takes the teacher back to the Instructional Objectives
component, and each successive component, so that plans for beginning
the next instructional unit can be developed. (New objectives are needed,
entering behavior is different, and methods will need to be reconsidered,)
But when evaluation results are not so positive, the Feedback Loop is a
mechanism for identifying possible explanations. (Note the arrows that
return to each component.) Were the objectives too vaguely specified?
Did students lack essential prerequisite skills or knowledge? Was the film
or text relatively ineffective? Was there insufficient practice opportunity?
Such questions need to be asked and frequently are. However, questions
need to be asked about the effectiveness of the performance assessment
procedures also, perhaps more frequently than they are. Were the test
questions appropriate? Were enough observations made? Were directions
clear to students? The Feedback Loop returns to the Performance
Assessment component to indicate that we must review and assess the
quality of out evaluation procedures, after the fact to determine the
appropriateness of the procedures and the accuracy of the information.
Unless the tools of evaluation are developed with care, inadequate
learning may go undetected or complete learning may be misinterpreted
as deficient.
In sum, good teaching requires planning for and using good
evaluation tools, Furthermore, evaluation does not take place in vacuum.
The BTM shows that other components of the teaching process provide
cues about what to evaluate, when to evaluate, and how to evaluate. Our
(2)
(3)
Placement assessment
Formative assessment
To monitor learning progress during instruction-
3.
Diagnostic assessment
To diagnose learning difficulties during instruction.
4.
Summative assessment
To assess achievement at the end of instruction.
These adjustments help to ensure students achieve, targeted standardsbased learning goals within a set time frame. Although formative
assessment strategies appear in a variety of formats, there are some
distinct ways to distinguish them from summative assessments.
Formative assessment is used to monitor learning progress
during instruction; its purpose is to provide continuous feedback to both
student and teaching concerning learning successes and failures.
Feedback to students provides reinforcement of successful learning and
identifies the specific learning errors and misconceptions that need
correction. Feedback to the teacher provides information for modifying
instruction and for prescribing group and individual work. Formative
assessment depends heavily on specially prepared tests and assessments
for each segment of instruction (e.g., unit, chapter. Tests and other types
of assessment tasks used for formative assessment are most frequently
teacher made, but customized tests for publishers of textbooks and other
instructional materials also can serve this function. Observational
techniques are, of course, also useful in monitoring student progress and
identifying learning errors. Because formative .assessment is directed
toward improving learning and instruction, the results typically are not
used for assigning course grades.
Diagnostic Assessment
According to Gron Lund (1990):
Diagnostic assessment is concerned with those educational' problems
which remains unsolved even after the corrective prescription of
formative assessment.
Diagnostic assessment is a highly specialized procedure. It is
concerned with the persistent or recurring learning difficulties that are
left unresolved by the standard corrective prescriptions of formative
assessment. If a student continues to experience failure in reading,
mathematics, or other subjects, despite the use of prescribed alternative
methods of instruction, then a more detailed diagnosis is indicated. To
use a medical analogy, formative assessment provides first-aid treatment
for simple learning problems and diagnostic assessment searches for the
underlying causes of problems that do not respond to first-aid treatment.
Thus, diagnostic assessment is much more comprehensive and detailed.
It involves the use of specially prepared diagnostic tests as well as
various, observational techniques. Serious learning disabilities also are
likely to require the services of educational, psychological, and medical
specialists, and given the appropriate diagnosis, the development of an
individualized education plan (IEP) for the student. The aim of diagnostic
assessment is to determine the causes of persistent learning problems and
to formulate a plan for remedial action.
Summative Assessment
The assessment that is carried out at the end of a piece of work is called
summative assessment.
Summative assessment typically comes at the end of a course
(or unit) of instruction. It is designed to determine the extent to which the
instructional goals have been achieved and is used primarily for
assigning course grades or free certifying student mastery of the intended
learning outcomes. The techniques used in summative assessment are
determined by the instructional goals, but they typically include teacher
made achievement tests, ratings on various types of performance (e.g.,
laboratory, oral report), and assessments of products (e.g., themes,
drawing, research reports). These various sources of information about
student achievement may be systematically collects into a portfolio of
work that may be used to summarize or showcase the student's
accomplishments and progress. Although the main purpose of summative
assessment is grading, or the certification of student achievement, it also
provides information for judging the appropriateness of the course
objectives and the effectiveness of the instruction.
1.1.3
Measurement
term 'measure' means `to find out the size or amount of something'.
"Measurement" in the International Dictionary of Education (by G Terry
Page & J.B. Thomas) means "the act of finding the dimension of any
object and the quantity found by such an act.
The 'Oxford Advance Learner's Dictionary defines
`measurement' as the 'standard or system used in stating the size, quantity
or degree of something.' It is the way of assessing something
quantitatively. It answers the question "How much?" In other words we
can say that measurement is the quantitative aspect of evaluation. With
the help of measurement we can easily describe students' achievement by
telling their scores. These definitions show that 'measurement' is the
quantitative assessment of something. Now let's see how the term is
defined specifically in education. L. R. Gay, (1985) defines measurement
as "a process of quantifying the degree to which someone or something
possesses a given trait, i.e. quality, characteristics or features."
Educational Measurement
(The concept of measurement in education)
In Education, the term 'measurement' is used in its specific
meanings. It is the quantitative assessment of the performance of a
student, teacher, curriculum or an educational program. We can say that
the quantitative score used for educational evaluation is called
measurement. The term is used for the data collected about student or
teacher performance by using a measuring instrument in a given learning
situation. It shows the exact quantity or degree of the performance, traits
or character of the person or thing to be measured. For example instead
of saying that Hamid is underweight for his age and height, we can say
that Hamid is 18 years old, 5' 8" tall, and weight only 85 pounds.
Similarly, instead of saying that Hamid is a more intelligent than Zahid,
we can say that Hamid has a measured. IQ of 125 and Zahid has a
measured IQ of 88. In each of the above cases, the numerical statement is
more precise, more objective and less open to interpretation than the
corresponding verbal statement.
Steps of measurement
There are two steps used for in the process of measurement. The
first step is to devise a set of operations to isolate the attribute and make
it apparent to us. Just a standard is used for judging the durability of a
thing, in the same way educators and psychologists use various methods
for testing the behaviour or performance of a student. For this purpose
they often use Stanford-Binet Tests or other tests that include operations
for eliciting behaviour that we lake to be indicative of intelligence.
The second step in measurement is to express the results of the
operations established in the first step in numerical or quantitative terms.
This involves an answer to the questions, how many or how much? Just
millimetre is used as a unit for indicating the thickness of a thing, in the
same way educators and psychologists use some numerical units for
gauging intelligence, emotional maturity and other attributes. Thus each
step in measurement rests on human- fashioned definitions. In the first
step, we define the attribute that interests us. In the second step, we
define the set of operations that will allow us to identify the attribute, and
express the result of our operations.
Difference between Evaluation and Measurement
Some people use 'evaluation' and 'measurement' in the same
meaning. Both the terms are used for the process of assessing the
performance of the student and collecting information about an
educational objective. Both tell how effective the school programme has
been and refer to the collection of information, appraisal of students, and
assessment of programme. Some recognize that measurement is one of
the essential components of evaluation. But there is difference between
the two terms. Roughly speaking, `measurement' is the quantitative
assessment whereas 'evaluation' is the quantitative as well as qualitative
assessment of the performance of a student or an educational objective.
Measurement is a limited process used for the assessment of limited and
specific educational objectives. On the other hand, evaluation is much
more comprehensive term used for all kinds of educational objectives.
Moreover, for measurement Evaluation is the continuous inspection of all
Prediction
Selection
Classification
Test
Measurement and evaluation are the two processes that are used
to collect information about the strengths and weaknesses of an
educational programme or the performance of a student, teacher or other
personnel. But these processes need some instruments for their
operations. Such instruments are called tests. So, the instruments that are
used to measure the sample of students' behaviour under specific
conditions are called tests. In other words we can say that:
2.
3.
4.
5.
6.
A set of problems, questions, etc., for evaluating abilities or
performance.
A test consists of a number of questions to be answered, a series
of problems to be solved, or a set of tasks to be performed by the
examinees. The questions might ask the examinees to define a word, to
do arithmetic computations, or to give some information. The questions,
problems and tasks are called test items.
Difference between Test, Measurement and Evaluation:
William Wiersma and Stephen G. Jurs (1990) in their book
"Educational Measurement and Testing" remarks that the terms of
Testing, measurement, assessment and evaluation are used with similar
meanings but they are not synonymous though they are related with each
other. They define these terms as follows:Test: "(It) has a narrower meaning than either measurement or
assessment. Test commonly refers to a set of items or questions under
specific conditions. When a test is given, measurement takes place;
however, all measurement is not necessarily testing".
Measurement: "For all practical purposes assessment and measurement
can be considered synonymous. When assessment is taking place,
maintains the real world relationships among persons with regard to what
is being measured".
Evaluation: "Evaluation involves judging the value or worth of a pupil
of an instructional method or of an educational program. Such
judgements may or may not be based on information obtained from
tests".
Robert L. Ebel and David A. Frisible (1986) in their book
"Essentials of Educational Measurement" rightly observe.
"All tests are a subset of the quantitative tools or techniques that
are classified as measurements. And all measurement techniques are a
subset of the quantitative and qualitative techniques used in evaluation."
Table showing relationship Between Testing, Measurement and
Evaluation:
Test
Measurement
An instrument or systematic
procedure for measuring a
sample of behavior
A systemati
collecting and
order to make
Answers
much?
How
It answers the
It is means
information
collecting
Involves
quantitative
decision-maki
Its objective
decisions abo
of educational
Depends up
measurement
Types of Tests
of
the
question
Eva
TESTS
Ability Tests
Achievement Test
Aptitude Test
Essay Tests
Objective Tests
Attitude Tests
Character Tests
Personality Tests
Intelligent Test
MA
CA
100
Where:
I.Q. = Intelligence Quotient
M.A. = Mental Age
C.A. = Chronological Age (Physical Age)
(B)
Personality Tests
job selection and for the selection of army recruits like ISSB
examinations. Personality tests include attitude tests, interest tests,
adjustment and temperament tests, character tests, and tests of other
motivational and interpersonal characteristics.
Uses of Tests
Tests play important role in teaching- learning process. Without
tests we cannot make evaluation or assessment of a student's or neither
teacher's performance nor we can collect information about the
effectiveness of an educational programme. That is why tests are very
important in education. They motivate students for learning. They serve a
number of purposes in a variety of educational activities. The following
are the different uses of tests;
1.
With the help of the result obtained from tests, teachers can
easily collect information about aptitude, intelligence, interests, attitude
and the overall performance of the students. He comes to know the
strengths and weaknesses of his teaching method. It becomes easy for the
teacher to grade students in a subject. Teats' results enable him to know
how the future success of a student in a subject can be predicted.
2.
children and can make a plan for their proper guidance. The result of the
tests in itself serves as a guideline for the students.
4.
Introduction:
The purpose of test is usually included the test is announced or
at the beginning of the semester when the evaluation procedures are
described as a part of the general orientation to the course. Should there
be any doubt whether the purpose of the test is clear to all pupils,
however it culd be explained again at at the time of testing. This is
usually done orally. The only time, a statement of the purpose of the test
needs to be included in the written direction is, when the test is to be
administered to several sections taught by different teachers, then a
written statement of purpose ensures, greater uniformity. There are
various types of test being applied in the educational institutions, because
no a childs ability interests and personality. One test measures only a
specific ability that is why school administers use many different types of
tests even in one single area such as intelligence, move than one test are
needed over a period of years to obtain a reliable estimate of ability each
test serves its own purpose, however, testing and evaluation serve
following purposes.
Types of Testing:
There are four types of testing.
Placement Testing:
Formative Testing:
Diagnostic Testing:
Summative Testing:
To Report to Administrators:
To Furnish Instruction:
To Conduct Research:
One purpose of the tests and evaluation is to find out the weak
points in the curriculum so that it could be changed in accordance will
the need, of the society.
2)
3)
4)
5)
Formative Evaluation
Forative and
ii. Summative
Formative Evaluation:
through a process that will yield expected outputs. The classroom teacher
is the best formative evaluator. Formative evaluation attempts.
1.
2.
3.
ii.
1.
2.
3.
4.
3.
4.
Formative
Su
Purpose
To check final
Content focus
Gernal Board
Methods
Daily assignments
Projects
Observations
Projects
Daily
Weekly, quarte
Frequency
1.5
b)
2.
3.
4.
5.
Test item that are answered correctly by most of the pupils are
not included in these test because of their inadequate
contribution to response variance. They will be the items that
deals with the important concepts of course content.
2.
a)
1.
2.
b)
1.
2.
3.
4.
5.
c)
Only some area readily land themselves for listing specific test
can be built and this may be a constructing element for teacher.
1.6
EDUCATIONAL:
Q1.
What idea?
Q2.
What document?
Q3.
Multiple choice
Short answer
Essay test
Written projects
Observational technique
essay tests and other written project are needed to assess the ability to
organize and express ideas. Projects that require students to formulate
problems, accumulate to formulate problems, accumulate information
through library research or collect data (e.g through experimental
observations or interviews) are needed to measure certain skills in
formulating and sawing problems observational techniques are needed to
assess performance skills and various aspects of students behavior and
self-report techniques are use full for assessing interests and attitudes. A
complete picture of students achievement and development requires the
use of many different assessment procedure.
Proper use of Assessment Procedure Requires an Awareness of their
Limitations
Not a single test can assess whatever the teacher want every
procedures has its plus points and negative pointes or we can say it is not
suitable for the things to be assessed. So one must how about it and takes
care of it so that we can get correct assessment results.
Some of the major problems are
1.
2.
3.
Sampling error
Chance factor
Incorrect interpretations
Sampling Error:
An achievement test may not adequately sample a particular
domain of instructional content. An observational instrument design to
assess a students social adjustment may not sample enough behavior for
a dependable index of this trait.
Sampling can be controlled though careful application of
established measurement procedures.
Chance Factor:
A second source of error is caused by chance factors influencing
assessment results, such as guessing on objective Tess, subjective scoring
on essay test, errors in judgment on observation devices and in consistent
responding on self report instrument.
UNIT-2:
JUDGING THE QUALITY OF THE TEST
Definition: Test percentile scores are just one type of test scores you will
find on your child's testing reports. Many test reports include several
types of scores. Percentile scores are almostalways reported on major
achievement that are taken by your child's entire class. Percentile scores
will also be found on individual diagnostic test reports. Understanding
test percentile scores is important for you to make decisions about your
child's special education program.
Test percentile scores commonly reported on most standardized
assessments a child takes in school. Percentile literally means
perhundred. Percentile scores on teacher-made tests and homework
assignments are developed by dividing the student's raw score on her
work by the total number of points possible. Converting decimal scores
to percentiles is easy. The number is converted by moving the decimal
point two places to the right and adding a percent sign. A score of .98
would equal 98%.
Test percentiles on a commercially produced, norm-referenced or
standardized test, are calculated in much the same way, although
thecalculations are typically included in test manuals or calculated with
scoring software.
If a student scores at the 75th percentile on a norm-referenced test, it
canbe said that she has scored at least as well, or better than, 75 percent
of students her age from the normative sample of the test. Several
othertypes of standard scores may also appear on test reports.
Percentile rank
From Wikipedia, the free encyclopedia
The percentile rank of a score is the percentage of scores in its frequency
distribution that are the same or lower than it. For example, a test score
that is greater than or equal to 75% of the scores of people taking the test
is said to be at the 75th percentile rank.
Percentile ranks are commonly used to clarify the interpretation of scores
on standardized tests. For the test theory, the percentile rank of a raw
score is interpreted as the percentages of examinees in the norm group
who scored at or below the score of interest.mm
Percentile ranks (PRs) are often normally distributed (bell-shaped) while
normal curve equivalents (NLEs) are uniform and rectangular in shape.
Percentile ranks are not on an equal-interval scale; that is, the difference
between any two scores is not the same between any other two scores
whose difference in percentile ranks is the same. For example, 50 _ 25 =
25 is not the same distance as 60 _ 35 = 25 because of the bell-curve
shape of the distribution. Some percentile ranks are closer to some than
others. Percentile rank 30 is closer on the bell curve to 40 than it is to 20.
The mathematical formula is
ce+0.5 fi
X 100%
N
where c is the count of all scores less than the score of interest, f is the
frequency of the score of interest, and Nis the number of examinees in
the sample. If the distribution is normally distributed, the percentile rank
can be inferred from the standard score.
2.1
Introduction:
Tests play a central role in the evaluation of pupil learning. They
provide relevant measures of many important learning outcomes. Tests
and other evaluation instruments serve a variety of uses in the school, for
example test of achievement might be used for selection, placement,
diagnosis or certification of mastery.
2.
3.
4.
5.
6.
7.
8.
Unclear Direction:
Directions that do not clearly indicate to the pupil how to
respond to the items will reduce validity.
Reading Vocabulary and Sentence Structure too Difficult:
Vocabulary and sentence structure that is too complicated for the
pupils will distort the meaning of the test results.
Inappropriate level of difficulty:
Items that are too easy or too difficult also lower validity.
Poorly constructed items:
Test items that provide clues to the answers will measure the
pupils alertness in detecting clues as well as those aspects of
pupil performance that the test is intended to measure.
Ambiguity:
Ambiguous statements confuse the pupil and so causes to
discriminate in a negative direction.
Inadequate time limits:
Time limits that do not provide pupil with enough time to
consider the items reduce the validity.
Test too short:
If the test is too short to provide a representative sample of the
performance we are interested in, its validity will suffer
accordingly.
Improper arrangement:
9.
2.
3.
4.
Content Validity:
Content validity is evaluated by showing how well the content
of the test samples the class of situations. It is especially
important in the case of achievement and proficiency measures.
It is also known as face validity.
Concurrent Validity:
It is evaluated by showing how well test scores correspond to
already accept measures of performance or status made at the
same time. For example, we may give a social studies class a
test on knowledge of basic concepts in social studies and at the
same time obtain from its teacher report on these abilities as far
as pupils in the class are concerned. If the relationship between
the test scores and the teachers report of abilities is high. The
test will have high concurrent validity.
Predictive Validity:
It is evaluated by showing how well prediction made from the
tests are confirmed by evidence gathered at some subsequent
time. When the tester wants to estimate how well a student may
be able to do in college courses on the basis of how well he has
done on test he took in secondary schools.
Construct Validity:
It is evaluated by investigating what psychological qualities a
test measures. It is ordinarily used when the tester has no
definitive criterion measure of what he is concerned with and
hence must use indirect measures. This type of validity is
Content Validity:
Content validity is the degree to which a test measures an
intended content area. In other words the content validity of a
test refers to the extent to which the test content represents a
specified universe of content.
For Example: for example if a teacher taught a course of
biology and would like to give a test at the end of the course.
ii.
Construct Validity:
Construct validity is the degree to which a test measures an
intended hypothetical construct. In other words construct
validity refers to the extent to which the test throughout the
major. There will be a correlation between the new measuring
iii.
iv.
v.
2.
3.
a.
b.
2.3
Correlation Coefficient:
Test scores are correlated with that of criterion scores. The
obtained coefficient of correlation is the extent of validity index
of the test.
Expectancy Table:
Test scores are evaluated or correlated with the rating of the
supervisors. It provides empirical probabilities of the validity
index.
Cross Validation:
It means to have another look for correlation coefficient with
another criterion or expect any tables with other criterion. It is
of two types.
Empirical validation
Logical or rationale validation
RELIABILITY, AND METHODS OF DETERMINING
RELIABILITY:
measures. The more reliable a test is, the more confidence we can have
that the scores obtained from the administration of the test are essentially
the same scores that would be obtained if the test where re-administered.
An unreliable test is essentially useless. For example, if an intelligence
test was unreliable then a student soring an IQ of 120 today might score
an IQ of 140 tomorrow and a 95 the day after tomorrow. On the other
hand if the test is reliable then the IQ of a student will remain nearly the
same each time the test is administered. The reliability of a test depends
upon the number of questions consisted by it. A test will be more reliable
if it possesses more questions. In this respect, objective type tests are
more reliable because its sampling is more extensive.
We can take another expert opinion to understand the meaning
of reliability. If a clinical thermo meter on three successive
determinations, for example yielded reading of 97 o, 103o and 99.6ofor the
same patient, it would not be considered very reliable.
Reliability, of course is a necessary but not a sufficient condition
for using a test. A highly reliable test may be totally invalid or may not
measure anything that is psychologically of educationally significant.
The reliability of a single of a single test score is expressed
quantitatively in terms of the instruments standard error of measurement.
If the standard error of measurement, for example, is 2.5, we can say that
there are approximately two chances in three (more precisely 68 in 100)
that the true score falls between 72.5 and 77.5 when the obtained score is
75. By definition, an unreliable test cannot possible be valid. The
necessary degree of reliability however depends on the use that is made
of test scores.
Methods of Determining Reliability:
For determining reliability, it is necessary that the test should be
valid and it should measure what it is designed to measure. It should be
administered to an appropriate person or group of parsons for whom the
test has been developed. Reliability is a statistical measure and therefore
Test-retest method:
When the reliability of the results are two measured, then at that
very situation the test retest method is used in this method the
tests are subjected to the group of students at different perrods
of time. The scores obtained from first and second time can be
correlated in order to check the stability and persistency of test.
In test re-test reliability the time factor counts a lot in very close
retesting the results are approximately the same, yielding high
correlation. But when the retest is administered after an year or
two, as the result of changes in the characteristics of students,
there are expected to be large variation in the outcome and
therefore stability will be low.\
Limitation:
i.
ii.
iii.
2.
ii.
This process is more time consuming and also it is not free from
carry over effect.
iii.
3.
1
2 x Reliabi lity on test
2
1
1+ Reliability on test
2
i.
ii.
4.
M ( K M )
K
1
K 1
K S2
iii. Immediate
5.
6.
2.4
Reliability:
The degree or the extent of the similarities among the results
obtained on several occasion or in other words it can be defined as the
degree to which an assessment instruments elicit stable and consistent
plethora results.
Reliability means consistency of measurement. The words of
Ebel & Frisbie The ability of a test to measure the same quantity when it
is administered, to an individual on two different occasions by two
different testers is called reliability.
Reliability also called dependability or trust worthiness.
Reliability is the degree to which a test consistently measures whatever it
measure.
Factors which affects the reliability:
The factors which badly affects the reliability are as under:
The examinee:
Fatigue burden, lack of motivation, carelessness.
Trait of Test:
2.
3.
4.
5.
The more objective the scoring of a test the more reliable is the
test.
6.
7.
8.
9.
Other things being equal the same test given late in the school
year (i.e. after covering the unit in the class) is more reliable that
when given Carly in the year (i.e. without teaching the unit).
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
References
Murad Ali Katozai 1st Edition, June, 2013 Measurement and
Evaluation.
Dr. Mohammad Nooman & Obaid Ullah 1st Edition June 27th
2013 A Manual of Educational & Social Science and Research
Methodologies.
2.5
PRACTICALITY:
Meaning:
The word Practicality means feasibility or us ability.
A test will be practicable if it is easy to administrated, easy to
interpret and economical in operation. A good test is that which have
sufficiently simple instructions so that it can be administered even by a
person of low level intelligence. Tests having difficult instructions and
requiring high level training for administering them and expensive for
wide use in schools are social to have low usability or practicability.
Practicality refers to the economy of time effort and money in testing. In
other words a test should be.
Easy to design
Easy to administer
Easy to interpret
Characteristics of Practicality:
There are many characteristics of practicality they are.
1.
2.
3.
4.
5.
6.
UNIT-3:
APPRAISING CLASSROOM TESTS (ITEMS
ANALYSIS)
3.1
3.1.1
Item Analysis
(a)
(b)
(c)
(d)
(e)
(f)
(2)
(3)
(4)
(6)
It provides the basis for preparing the final draft a test. In the
final draft items are arranged in difficulty order. The most easy
items are given in the beginning and most difficult items are
provided at the end.
(7)
(2)
(2)
To select the best items for a test with regard to its purpose after
a proper try out on the group of subjects selected from the target
population.
(3)
(4)
(5)
To modify and reject OF poor items of the test. The poor items
may not serve the purpose of the test. The powerful distractor of
items are changed an'tkpoor distracters are also changed.
(6)
2)
(b)
(1) (n 1)
(b)
(c)
3.2
comprises, but then what constitutes a good exam item? Our students
seem to know, or at least believe they know. But are they correct when
they claim that an item was too difficult, too tricky, or too unfair?
Lewis Aiken (1997), the author of a leading textbook on the
subject of psychological and educational assessment, contends that a
"postmortem" evaluation is just as necessary in classroom testing as it is
in medicine. Indeed, just such a postmortem procedure for exams exists-item analysis, a group of procedures for assessing the quality of exam
items. The purpose of an item analysis is to improve the quality of an
exam by identifying items that are candidates for retention, revision, or
removal. More specifically, not only can the item analysis identify both
good and deficient items, it can also clarify what concepts the examinees
have and have not mastered.
So, what procedures are involved in an item analysis? The
specific procedures involved vary, but generally, they fall into one of two
broad categories: qualitative and quantitative.
Qualitative Item Analysis
Qualitative item analysis procedures include careful
proofreading of the exam prior to its administration for typographical
errors, for grammatical cues that might inadvertently tip off examinees to
the correct answer, and for the appropriateness of the reading level of the
material. Such procedures can also include small group discussions of the
quality of the exam and its items with examinees who have already taken
the test, or with depaitinental student assistants, or even experts in the
field. Some faculty use a "think-aloud test administration" (cf. Cohen,
Swerdlik, & Smith, 1992) in which examinees are asked to express
verbally what they are thinking as they respond to each of the items on an
exam. This procedure can assist the instructor in determining whether
certain students (such as those who performed well or those who
performed poorly on a previous exam) misinterpreted particular items,
and it can help in determining why students may have misinterpreted a
particular item.
Divide the group of test takers into two groups, high scoring and
low scoring. Ordinarily, this is done by dividing the examinees
into those scoring above and those scoring below the median.
(Alternatively, one could create groups made up of the top and
bottom quintiles or quartiles or even deciles.)
2.
3.
Item Number
Item Difficulty
For items with one correct alternative worth a single point, the
item difficulty is simply the percentage of students who answer an item
correctly. In this case, it is also equal to the item mean. The item
difficulty index ranges from 0 to 100; the higher the value, the easier the
question. When an alternative is worth other than a single point, or when
there is more than one correct alternative per question, the item difficulty
is the average score on that item divided by the highest number of points
for any one alternative. Item difficulty is relevant for determining
whether students have learned the concept being tested. It also plays an
important role in the ability of an item to discriminate between students
who know the tested material and those who do not. The item will have
low discrimination if it is so difficult that almost everyone gets it wrong
or guesses, or so easy that almost everyone gets it right.
To maximize item discrimination, desirable difficulty levels are
slightly higher than midway between chance and perfect scores for the
item. (The chance score for five-option questions, for example, is 20
because one-fifth of the students responding to the question could be
expected to choose the correct option by guessing.) Ideal difficulty levels
for multiple-choice items in terms of discrimination potential are:
Format
Ideal Difficulty
Five-response multiple-choice
70
Four-response multiple-choice
74
Three-response multiple-choice
77
85
Item Discrimination
Alternate Weight
Means
The mean total test score (minus that item) is shown for students
who selected each of the possibleresponse alternatives. This information
should be looked at in conjunction with the discrimination index; higher
total test scores should be obtained by students choosing the correct, or
most highly weighted alternative. Incorrect alternatives with relatively
high means should be examined to determine why "better" students chose
that particular alternative.
The length of the test -- a test with more items will have a
Interpretation
.80- .90
.70 - .80
.60 - .70
.50 - .60
.50 or below
absolute value of the coefficient (that is how large the number is whether
it is positive or negative). The sign indicates the direction of the
relationship (whether positive or negative). Return to the text.
*Software capable of displaying a PDF is required for viewing or
printing this document. Adobe Reader is available free of charge from the
Adobe Web site at http://www.adobe.com/products/acrobatireadstea.html
QUESTION:
A few years ago in your Shiken column, you showed how to do
item analysis for weighted items using a calculator (Brown, 2000, pp. 1921) and a couple of columns back (Brown, 2002, pp. 20-23) you showed
how to do distractor efficiency analysis in a spreadsheet program. But, I
don't think you have ever shown how to do regular item analysis statistics
in a spreadsheet. Could you please do that? I think some of your readers
would find it very useful.
ANSWER:
Yes, I see what you mean. In answering questions from readers,
I explained more advanced concepts of item analysis without laying the
groundwork that other readers might need. To remedy that, in this
column, I will directly address your question, but only with regard to
norm-referenced item analysis. In my next Statistics Corner column, I
will address another reader's question, and in the process show how
criterion-referenced item analysis can be done in a spreadsheet.
The Overall Purpose of Item Analysis
Let's begin by answering the most basic question in item
analysis: Why do we do item analysis? We do it as the penultimate step
in the test development process. Such projects are usually accomplished
in the following steps:
1.
2.
3.
4.
5.
result in this case is .94, a very easy item because 94% of the students are
answering correctly.
All the other NRT and CRT item analysis techniques that I will
discuss here and in the next column are based on this notion of item
facility. For instance, item discrimination can be calculated by first
figuring out who the upper and lower students are on the test (using their
total scores to sort them form the highest score to the lowest). The upper
and lower groups should probably be made up of equal numbers of
students who represent approximately one third of the total group each.
In Screen 1, I have sorted the students from high to low based on their
total test scores from 77 for Hide down to 61 for Hachiko. Then I
separated the three groups such that there are five in the top group, five
in the bottom group, and six in the middle group. Notice that Issaku and
Naoyo both had scores of 68 but ended up in different groups (as did
Eriko and Kimi with their scores of 70). The decision as to which group
they were assigned to was made with a coin flip.
ITEM DIFFICULTY:
Definition:
item difficulty is a measure of the proportion of individuals who
responded correctly to each test item. Item difficulty in a test
determined by two proportion of individuals who correctly respond to the
item in particular.
item difficulty of a test for a particular group is evaluated by the
percentage of participates who respond correctly.
Explanation:
Item difficulty is simply the percentage of students taking two
tests who answered the item correctly. The larger the percentage getting
an item rights the easier two items. The higher the difficulty index, the
#1
24*
#2
12*
13
24
30
P= .80
A rough role of thumb is that if the item difficulty is more
then 75, it is an easy item; if the difficulty is below 25, it is a difficult
item. Given these parameters, this item could be regarded moderately
( 1230 =.40)
P=
12
30
P= .40
In fail, on question # 2, more students selected an incorrect
answer (B) than selected the correct answer (A). This item should be
carefully analyzed to ensure that B is an appropriate distracter.
Therefore Item difficulty should have been named item
easiness; it expresses the proportion or percentage of students who
answered two items correctly.
3.4
Introduction
1.
2.
3.
the
the
are
the
OMIT
%
0
0
0
A
%
0
79
4
B
%
18
0
7
C
%
82
0
89
D
%
0
21
0
E
%
0
0
0
KEY-
2.
3.
4.
Item difficulty! Very easy or very difficult items are not good
discriminators. If an item is so easy (e.g., difficulty = 98) that
nearly everyone gets it correct or so difficult (e.g., difficulty =
30)
That nearly everyone gets it wrong, then it becomes very
difficult to discriminate those who actually know the content
from those who do not.
That does not mean that very easy and very difficult items
should be eliminated. In fact, they are fine as long they are used
with the instructor's recognition that they will not discriminate
well and if putting them on the test matches the intention of the
instructor to either really challenge students or to make certain
that everyone knows a certain bit of content.
A poorly written item will have little ability to discriminate.
C
A
C
Example 2
Another measure, the Discrimination Index, refers to how well
an assessment differentiates between high and low scorers. In other
words, you should be able to expect that the high-performing students
would select the correct answer for each question more often than the
low-performing students. If this is true, then the assessment is said to
have a positive discrimination index (between 0 and 1) -- indicating that
students who received a high total score chose the correct answer for a
specific item more often than the students who had a lower overall score.
If, however, you find that more of the low-performing students
got a specific item correct, then the item has a negative discrimination
index (between -1 and 0). Let's look at an example.
Table 1 displays the results of ten questions on a quiz. Note that
the students are arranged with the top overall scorers at the top of the
table 1
Table-1:The Index of Discrimination
Student
Qu
1
Asif
90
Sam
90
Jill
80
Charlie
80
Sonya
70
Ruben
60
Clay
60
Kelley
50
Justin
50
Tonya
40
1.
After the students are arranged with the highest overall scores at
the top, count the number of students in the upper and lower
group who got each item correct. For Question #1, there were 4
students in the top half who got it correct and 4 students in the
bottom half.
Determine the Difficulty Index by dividing the number who got
it correct by the total number of students. For Question #1, this
would be 8/10 or p=.80.
Determine the Discrimination Index by subtracting the number
of students in the lower group who got the item correct from the
number of students in the upper group who got the item correct.
Then, divide by the number of students in each group (in this
case, there are five in each group). For Question #1, that means
you would subtract 4 from 4, and divide by 5, which results in a
Discrimination Index of 0.
The answers for Questions 1-3 are provided in Table 1
2.
3.
4.
Table-2
Item
# Correct
(upper group)
# Correct
(Lower group)
Difficulty
(p)
Discrimination
(D)
Question 1
.80
Question 2
.30
-0.6
Question 3
.60
0.8
In all cases, action must be taken! So, items with negative item
difficulty must be addressed. Items with discrimination indices less than .
20 (or slightly over, but still relatively low) must be revised or
eliminated. Be certain that there is only one possible answer, that the
question is written clearly, and that your answer key is correct.
UNIT-4:
INTERPRETING THE TEST SCORES
4.1
Two well-known tests in the United States that have scaled scores are the
ACT and the SAT. The ACT's scale ranges from 0 to 36 and the SAT's
from 200 to 800 (per section). Ostensibly, these two scales were selected
to represent a mean and standard deviation of 18 and .6 (ACT), and 500
and 100. The upper and lower bounds were selected because an interval
of plus or minus three standard deviations contains more than 99% of a
population. Scores outside that range are difficult to measure, and return
little practical value.
Note that scaling does not affect the psychometric properties of a test, it
is something that occurs after the assessment process (and equating, if
present) is completed. Therefore, it is not an issue of psychometrics, per
se, but an issue of interpretability.
Interpretation the Score by Criterion Referencing
The raw score is number of points received on a test when the test has
been scored according to the instructions. Raw score is not very
meaningful without further information. Criterion-referenced test
interpretation permits us to describe an individual's test performance
without referring to the performance of other individuals. Thus we might
describe a student's performance in terms of the speed, precision with
which a certain task is performed. Criterion-referenced interpretation of
test scores is most meaningful when the test is designed to measure a set
of clearly stated learning tasks. Enough items are used for each
interpretation to make dependable Judgments.
Interpretation the Score by Percentages
In mathematics, a relationship with 100 is called percentage (denoted by
%). Often it is useful to express the scores in terms of percentages for
comparison. Consider the following example.
Grade
Class A No. of
Students
Class B No. of
Students
10
12.50
40
25
31.25
30
30
37.50
20
15
18.75
10
Total
80
100
20
100
Ten students from class A and eight students from class B got
grade A. It looks apparently that class A is better in getting A grade but
12.5% of the students from class A and 40% students from class B got
grade A. It is clear from the percentages that class.
B is far better in getting grade A than class A.
Interpretation the Score by Norm Referencing
Interpretation of scores by norm referencing involves making of
scores and expressing a given score in relation, to the other scores Normreferenced test interpretation tells us how an individual is compared with
other persons who have taken the same test. The simplest type of
comparison is to rank the scores from highest to lowest and to note where
an individual's score falls. The rest of the scores serve as the norm group.
The given score is compared with the other scores by norm referencing.
If a student's score is second from the top in a group of 20 students, it is a
high score meaning that the scores of 90% of the students are less than
him.
Ordering and Ranking
A first step in organizing scores in the listing of scores in order
of magnitude from largest to the smallest score. The data so arranged are
called ordered array. By scanning an ordered array, we can determine
quickly the largest score, the smallest score and other facts about the
data.
Ranked data consists of scores in a form that shows their
relative position on some characteristic but does not yield a numerical
value for this characteristic. The order of finish of cars in a race is an
example of ranking. If we list the cars as first, second, third etc. up to the
last car, we can say that they were ranked on the characteristic of overall
speed. We know each car's position relative to any other car's position but
we have no precise knowledge of the speed of any car. A high school
teacher ranked Hamid 30th in a class of 100 means that Hamid did better
than 70 of his classmates but poorer than 29. But nothing has been aid
about Hamid's general level of achievement.
Measurement Scales
Measurement scales are of great significance in analyzing and
interpreting results. The important types of measurement scales are:
The Nominal Scale
The lowest measurement scale is the nominal scale. In this
scale, each individual is put into one of the distinct categories or classes.
Each class has a name. The names are just labels. There is no order in
these classes. We cannot say that one class is larger than the other class.
You cannot do arithmetic operations (addition, subtraction,
multiplication, division) on this scale.
Examples of the nominal scale are Categorization of blood
groups of the students of a college into A, B, AB and 0 groups. We
cannot say that group A is better than group B. Classification of books in
a college library according to subjects.
Distribution of the population of Pakistan according to sex,
religion, occupations, marital status, literacy etc., is examples of the
nominal scale.
The Ordinal Scale
When measurements are not only different from category to
category but can also be ranked according to some criterion, they are to
be measured on an ordinal scale. The members of anyone category are
considered equal but members of one category are considered lower than
those in another category. The ordinal scale is one-step higher than the
nominal scale because we distribute the individuals not only in classes
but we also order these classes.
Frequency Distribution
Data that have been originally collected is called raw data or
primary data. It has not yet undergone any statistical technique. To
understand the raw data easily, we arrange into groups or classes. The
data so arranged is called groups data or frequency distribution.
General rules far the construction of a frequency distribution:
1.
2.
3.
5.
5.
Example:
The marks obtained by 120 students of first year class in the
subject of Education are given below-Construct a frequency distribution.
57
86
69
62
75
73 80
78
87 83
77
35 70
6
8
84
73 81 78
61
72
59
98
95
63 76
73
88 60
52
83 86
4
5
70
53 85 74
62
78
89
84
60
79 91
64
84 85
81
79 90
7
8
83
50 71 65
76
58
71
79
51
61 61
89
81 74
76
74 82
9
1
71
76 80 52
71
66
77
65
44
79 95
74
79 63
83
87 77
7
5
83
48 70 85
61
70
72
67
61
83 75
79
97 75
66
54 81
78
75 83 61
8
33
76
62
55
72 76
78
75 99
80
83 86
2.
3.
I=
Range
No . of class intervals
5.
Grade
Tallies
90 - 99
||||| |||
80 89
70 79
60 69
50 59
||||| |||||
40 49
|||
30 39
||
Frequency
The number of scores lying in a class interval is called the
frequency of that class interval. For Example two scores lie in the class
interval 30-39. Therefore 2 is the frequency of the class interval 30-39.
Mid-Point of Class Mark
The middle of a class interval is called mid point or class mark
and is usually denoted by X. It is calculated as
Midpoint = X =
69
2
= 34.5
x
X =
N
Where
Read as sigma, is the Greek symbol means sum of.
X Means sum of the values of variable X.
N is the number of scores of measurements. In order to calculate
the arithmetic mean for grouped data formula is: X fx / f /
Where
fx means the sum of product of values for f and `x', f means
frequency of the scores and x means score. f means the sum of
all the frequencies of the distribution.
The Median:
The median of a set of scores is the middle score of the
arithmetic mean of two middle scores in an array. 50% of the scores are
less than median and 50% of the scores are greater than median.
Formula for calculating median for ungrouped data:
Median =
( N 2+ 1 )
th score
i N
+ C
f
2
Where
L = lower class boundary of the median class interval.
I = length of the median class interval.
F - the frequency of the median class interval.
N= f
C = the cumulative frequency of the class interval below the median class
interval.
The Mode
The mode is the score that occurs greatest number of times in a
data set. Mode does not always exist. If each score occur the same
number of times, there is no mode. There may be more than one mode. If
two or more scores occur greatest number of times, then there are more
than one mode.
The mode can be calculated for grouped data with the help of
following formula.
f mf l
L+
i
Mode =
2 f mf 1f 2
Where
L = lower class boundary of the modal class interval.
Fm = the maximum frequency.
Fi = the frequency preceding to the modal class.
f2 = the frequency succeeding to the modal class,
I = the length of the modal class interval.
Note: The mode lies in the class interval having maximum frequency.
This class interval is called the modal class.
Empirical Relationship between Mean, Median and Mode:
For moderately skewed distributions, we have the following
empirical relation:
Mode = 3 Median 2 Mean
Mode = 3 (74.61) 2 (73.42)
Mode = 76.99
Comparison of Measures of Central Tendency:
The numerical value of every score in a data set contributes to
the mean. This is not true of the mode or median because only the mean
is based on the sum of all the scores. In a single peaked symmetrical
distribution mean = median = mode. In practice, no distribution is exactly
symmetrical, so the mode, median and mean usually have different
values. If a population is not symmetrical, the mean, median and mode
will not be equal. The mean is affected by the presence of a few extreme
scores which the median and mode are not. The mean is preferred if
extreme values are not present in the data. Median is preferred if interest
is centered on the typical rather than the total score and if the distribution
is skewed. If some scores are missing so that the mean cannot be
( N 4+ 1 )
Q2 =
2 ( N +1 ) N + 1
=
th score
4
2
Q3 =
3 ( N +1)
th score
4
th score
and
4.2
The Percentiles:
The values that divide a set of scores into hundred equal parts
are called percentiles and are denoted by P1, P2, P3, .. and P99.?
P25 is the first quartile, P75 is the third quartile and P50 is the median.
The Percentile Ranks (PR):
The procedure for calculating percentile ranks is the reverse of
the procedure for calculating percentiles. Here we have an individual's
score and find the percentage of scores that lies below it. In the Example,
we calculate P78== 83.37. It means that 83.37 is the score below which
78% of the scores fall. If a student has a score of 83.37, we can say that
his percentile rank (PR) is 78 on a scale of 100.
Positive Correlation
Negative Correlation
No correlation
The most common methods of computing the Coefficient of
correlation are:
1.
Rank-difference method:
Rs = 1
6 D
2
N (N 1)
XY X Y
( )( )
( ) ( )
rxy =
X2
N
Y2
N
Measures of Variability:
Measures of central tendency measure the centre of a set of
scores. However, two data sets can have the same mean, median and
mode and yet be quite different in other respects. For example, consider
the heights (in inches) of the players of two basketball teams.
Team-1: 72 73 76 76 78
Team-2: 67 72 78 76 84
The two teams have the same mean height. 75 inches, but it is
clear that the heights of the players of team 2 vary much more than those
of team 2. If we have information about the centre of scores and the
manner in which they are spread out we know much more bout set of
scores. The degree to which scores tend to spread about. an average value
is called dispersion.
The Range
It is the simplest measure of dispersion. The range of a set of
scores is the difference between maximum scores and minimum scores.
In symbols
Range = Xm Xo
Where Xm is the maximum score and Xo is the minimum score.
Quartile Deviation:
The quartile deviation is defined as half of the difference
between the third and the first quartiles.
In symbols
Q. D. = (Q3 Ql) / 2
Where
Q1 is the first quartile and
Q3 is the third
The Mean Deviation or Average Deviation:
The average deviation is defined as the arithmetic mean of the
deviations of the scores from the mean or median; the deviations are
taken as positive. In symbols
X X
N
M.D. =
M.D. =
f X X
f
X X 2
S=
S=
X2 X
N
( )
N
S
X
100
Standard Scores:
A frequently used quantity is statistical analysis is the standard
score or Z-score. The standard score for a data value in the number of
standard deviations that the data value is away from the mean of the data
set.
Z=
X X
S
2.
3.
4.
5.
6.
7.
In a formal distribution,
0.674+5 to + 0.6745 covers 50% of the area.
to + covers 68.27% of the area.
- 2 to + 2 covers 95.45 of the area.
- 3 to IA + 3 covers 99.73% of the area.
4.3
STANDARD SCORES:
of average scores, from low average to high average, with most students
earning standard scores on educational and psychological tests that fall in
the range of 85-115. This is the range in which 68% of the general
population performs and, therefore, is considered the normal limits of
functioning.
Classifying Standard Scores
However, the normal limits of functioning encompass three
classification categories: low average (standard scores of 80-89), average
(standard scores of 90-109), and high average (110-119). These
classifications are used typically by school psychologists and other
assessment specialists to describe a student's ability compared to sameage peers from the general population.
Subtest Scores
Many psychological tests are composed of multiple subtests that
have a mean of 10, 50, or 100. Subtests are relatively short tests that
measure specific abilities, such as. vocabulary, general knowledge, or
short-term auditory memory. Two or more subtest scores that reflect
different aspects of the same broad ability (such as broad Verbal Ability)
are usually combined into a composite or index score that has a mean of
100. For example, a Vocabulary subtest score, a Comprehension subtest
score, and a General Information subtest score (the three subtest scores
that reflect different aspects of Verbal Ability) may be combined to form
a broad Verbal Comprehension Index score. Composite scores, such as
IQ scores, Index scores, and Cluster scores, are more reliable and valid
than individual subtest scores. Therefore, when a student's performance
demonstrates relatively uniform ability across subtests that measure
different aspects of the same broad ability (the Vocabulary,
Comprehension, and General Information subtest scores are both
average), then the most reliable and valid score is the composite score
(Verbal Comprehension Index in this example). However, when a
student's performance demonstrates uneven ability across subtests that
measure different aspects of the same broad ability (the Vocabulary score
is below average, the Comprehension score is below average, and the
PROFILE:
The score bands used with the differential aptitude test can be
plotted by hand or by computer. The computer produced profile shown in
figure 14.3 is based on the same sex percentiles. There are recorded
down the left side of the profile and were obtained from the percentile
norms in table 14.3 the opposite sex percentiles are listed down the right
side of the report also to show how the scores compare with the female
norms. The differences in percentiles for some tests plot the results
directly on the profile. The use of such bands minimizes the tendency of
test profiles to present a misleading picture. Without the bands we are apt
to attribute significance to differences in test performance that can be
accounted for the chance alone.
When profiles are used to compare test performance, it is
essential that the norms for all tests be comparable. Many test publishers
provide for this by standardizing a battery of achievement tests and a
scholastic aptitude test on the same population.
Profile Narrative Reports
Some test publishers are now making available to profile of
each pupils cores, accompanied by a narrative report that describes how
well the pupil is achieving. The graphic profile provides a quick view of
the pupils strengths and weaknesses, and the narrative report aids in
interpreting the scores and in identifying areas in which instructional
emphasis is needed. A typical report of this type, for a widely used test
battery, is shown in figure 14.4.
Narrative reports should be especially useful in communicating
test result to parents. They are, of course, also helpful to those teachers
who have had little or no training in the interpretation and use fo scores
from published tests.
UNIT-5:
EVALUATING PRODUCT, PROCEDURES &
PERFORMANCE
5.1
Learning Outcome:-
solving?
What is the attitude of pupils to learning curriculum?
Do our pupils enjoy learning? Are they motivate to learning.
In numeracy what do you understand by each of the following
skills?
Oral language
Reading
Writing
Digital literacy
Learning Experience.
Learning environment
Engaged in learning
Learning to learning
Engaged in Learning
approaches used?
Do we encouraged pupil questioning considered teacher input
Pupils enjoy learning in class room and are eager to find out
more,
All students in class room afforded the opportunity to participate
in lesson and engage with learning.
Learning Environment
To involve the students in development rules which recognize
the rights of responsibilities of the community.
Prepare supervision of pupils both within the class and at break
times within the school setting.
All the recourses well organized, labeled and clear to all
learners.
Celebrates pupils learning and achievements through a range of display.
Concrete and visual materials, centers of interest and display of pupil
work.
Learning to Learn
Learning to learning is the third sub theme of learning
experiences.
To engage the pupils to monitor their own progress in learning
for learning technique to utilize them properly in class room to develop
the skills of learner by proper planning of lessons.
To allow the learner to communicate work with other in the clam.
How do we enable the student learner to develop their personal
organization to plan out their own work study and revision skills do we
teach.
To teach the pupils how to organized prose nil the work.
To make the pupil creative and give the opportunity for collaborative
work.
3.
Teacs Practice
Assessment
1.
Learning outcome:
Do we provide class, relevant and differentiated learning out
comes to pupil?
How pupils are made aware of what they are going to learn?
Are pupil familiar with the expected success criteria in learning
activities.
Written Plans
Are the long and short terms plans prepared in accordance with
the curriculum?
How we identified these opportunities an are whole school plans
individual planning?
Resources
How satisfied are we with the resources, materials and
equipment we have with in our class room and available within the
school? Are the necessary and relevant material & readily available?
Assessment
Teaching Approach
Learning Outcome
linked to curriculum.
What provision is made to ensure expected learning outcomes
are achieved during lesson?
Focus of Learning
Effects:
appropriate prominence.
The contents of individual pieces of literature are largely
Presentation of Results
The tables and figures are legible and easily to grasp at the first
glance It is evident that the author has spent find best visual
means of presentation.
The author largely points out the most striking result at the same
time; however he/she concentrates too much discussing aspects
that are not entirely relevant to research question at hand.
Descriptive title
Author
Abstract (NOTE: MAXIMUM 200 WORDS)
Introduction
Hypotheses and predictions (these can be incorporated into the
introduction or presented below a separate sub-heading)
Study Area and Organisms
Methods and experimental design
Significance of work
Literature Cited (use a journal format of your choice (e.g.
Journal of Ecology, Oecologia)
7. Literature Cited: Are the papers in the body of the proposal cited in
here? Are the references cited here in the body of the text? Is consistent
formatting used? (5 points)
8. Presentation and clarity: Are the different sections of the proposal
well linked? Are the ideas presented clearly and can they be followed
from one section of the proposal to the next? Is the writing style clear
(topic sentences introduce themes presented in each paragraph; concise
language used; spelling and grammar acceptable)? Use of tense and
active/passive voice is consistent? Note: Use the past tense to describe
results found in previous studies; use future tense "we will..." to describe
work that you propose to do (5 points).
5.2
Specific Questions:
Evaluator can ask more specific questions about:
to individual work.
Group work process versus the group work product.
The effectiveness of group work in class and/or out of class to
enhance learning,
The appropriateness of group work.
Organizational, planning, management and monitoring issues.
Strengths and weakness of group work and ideas for
improvement.
Diversity issues (did some students find it easier or harder,
benefit more
Timing of Evaluation:
Evaluation an occur at any time during the unit of study
program, but it usually occurs at the end of the semester or at the end of
the task that is being undertaken and evaluated.
Ideally students should be given time to reflect upon their
experiences prior to completing any form of evaluation especially if
evaluator desire some specific information about their experiences of
group work or have a specific reflection component within the work
being evaluated.
It's a good idea to explain all of this at the start of the unit of
study and to provide opportunities for students to reflect along the way.
Evaluation can also be built into the requirements of the group
work tasks by asking students to complete an evaluation of their own or
the whole groups experience of group. This could also be a requirement
of their assessment. It is up to evaluator whether or not to allocate marks.
Method for Collecting Data for Evaluation:
There is no single method for designing or conducting an
evaluation method can be quantitative or qualitative, formal or informal,
formative or summative, self administrated or externally administered, or
any combination of these. There are advantages and disadvantages to
each method and evaluator will largely depend upon the purpose of the
evaluation and the content, material practices, tasks or activities being
evaluated,
1.
Questionnaire:
Agree
Neural
Disagree
Checklist:
Checklist is another method that can provide basic data.
Autonomy.
Opportunity to get to know my classmates.
Uses of Evaluation:
It is important to consider who will use the evaluations and how
it will be used. This is a key part of the planning process which relates to
the purpose of the evaluation.
It is also important to reflect upon and consider the methods that
have been used to gather information about the effectiveness of group
work.
5.3
EVALUATING DEMONSTRATION:
1.
2.
3.
Demonstration
1.
2.
b.
c.
b.
questions about what you are doing or should do, will give them
an opportunity to prove they know the procedure, although they
have not yet flown it.
c.
d.
Passabl
e
Satisfactor
y
Good
Very good
content and
target group.
Content
Correspondence
between
the
topic
and
content of the
demonstration
Academic nature of
the content
Consistency
and
clarity
of
presentation of
the content
Critical approach
Many-sided
argumentation
Connection
between theory
and practice
Aptness, diversity
and topicality
of the research
data used
Use
of
own
research results
Consideration given
to the target
group in the
choice
of
content
Methods
The content is
academic.
The applicant
presents
the content
clearly and
consistentl
y.
Where
appropriate
,
the
applicant
uses
his/her
own
research
results
during the
demonstrat
ion.
The applicant
takes the
target
group into
considerati
on when.
The
teaching The
teaching
Organization
of
teaching
Motivation of target
group
Suitable use of
teaching
methods
Suitable use of
teaching aids and
materials
Wrap-up
Evaluation of the
teaching
situation in terms
of the objectives
set
Consideration given
to the target
group
in
solutions related
to evaluation
situation is
organized
appropriate
ly.
situation is
organized
appropriate
ly, taking
into
considerati
on
its
objectives,
contents,
target
group and
context.
The applicant
inspires the
target
group
to
engage,
stimulates
the
listeners
interest and
motivates
them
to
participate.
The applicant
uses
different
teaching
methods
appropriate
ly in terms
of
the
situation,
objectives.
considerati
on.
Interaction skills
Use of voice
Clarity
and
intelligibility of
speech
Coherence of oral
and
written
communication
Quality
of
interaction
Other
matters
improving
communication.
Alignment of the
preliminary
assignment
and
the demonstration
of teaching skills
The
preliminar
y
assignment
and
the
demonstrat
ion
of
teaching
skills are
well
aligned.
preliminary
assignment
lays
the
foundation
for
and
supports
the
demonstrat
ion,
and
the
two
form
a
consistent
whole.
5.4
Motor Skills
A motor skill is a function, which involves the precise
movement of muscles with the intent to perform a specific act.
Motor skills are skills that are associated with the activity of the
body muscles like the skills performed in sport. Fine motor skills arc the
type that is associated with small movements of the wrists, hands, feet,
fingers and toes.
Motor skills are the ability to make particular bodily movements
to achieve certain tasks. They are a way of controlling muscles to make
fluid and accurate movements. These skills must be learned, practiced
and mastered, and overtime can be performed without thought, for
example, walking or swimming. Children are clumsy in comparison to
adults, because they have yet to learn many motor skills that allow them
to effectively accomplish tasks.
Motor skills are also learned and refined in adulthood. If a
woman takes up belly dancing, her first movements will not closely
resemble that of the teacher. Overtime however, she will learn how to
control her muscles to make the signature movements that a belly dancer
makes.
Genetic factors also affect the development of motor skills, for
example, the children of a professional dancer are far more likely to be
good at dancing, with good coordination and muscular control, than the
children of a biochemist. Gross motor skills are usually learned during
childhood and require a large group of muscles to perform actions, such
as balancing or crawling. Fine motor skills involve smaller groups of
muscles and are used for fine tasks, such as threading a needle or playing
a computer game. These skills can be forgotten if disused over time.
motor skills, because it uses both the arms and legs working together to
accomplish the task. Hopscotch, basketball and walking on a balance
beam are also good ways to evaluate gross motor skills. Look for the
fluidity of movement, problems with balance and hand-eye coordination.
3. Evaluate fine motor skills of arms and legs. Ask the individual to put
a clothespin on the edge of a box. Stringing beads on a shoelace is
another way to assess fine motor skills. Using a stapler and placing a
paperclip on a sheet of paper are also ways to assess fine motor skills.
Place an item on the floor and ask the individual to pick it up using his
toes only. Watch the individual perform each task, looking at how smooth
the movements are and how easily the task is completed, and note any
difficulties.
4. Test fine motor skills using common household items. Give the
individual a jar and ask her to unscrew the lid and screw the lid back on.
Ask the individual to place items, such as coins or blocks, into containers
such as a bowl, bucket or cup. Draw a straight line on a piece of paper
and have the individual use a pair of scissors to cut the line on the paper.
Using pencils or pens of different sizes, ask the person to pick up and
grasp each pencil/pen. Then ask the individual to trace items drawn on
the paper. Watch for the completion of each task, looking for any
problems during each movement.
5. Assess fine motor skills while getting dressed. Ask the individual to
put on and button up a shirt. Next, have the individual put on a pair of
pants that have a snap closure and a zipper. Give the individual a pair of
shoes, which have shoelaces and not Velcro closures, and ask him to tie
the shoes. Watch the individual perform each task, looking for
difficulties, abnormal movements and the ability to perform each task
completely without help.
Some Motor Skills and Their Evaluation for Preschoolers
Dancing, either freestyle or through songs with movements, such as "I'm
a Little Teapot Dance and movement classes, like pre-ballet, can be fun
but aren't necessary for motor-skills developmen
skills will not get anyone very far if he or she cannot listen to other
people and respond appropriately. Encourage your students to listen as
they speak and have appropriate responses to others in the conversation.
Fluency
Fluency may be the easiest quality to judge in your students'
speaking. How comfortable are they when they speak? How easily do the
words come out? Are there great pauses and gaps in the student's
speaking? If there are then your student is struggling with fluency.
Fluency does not improve at the same rate as other language skills. You
can have excellent grammar and still fail to be fluent. You want your
students to be at ease when they speak to you or other English speakers.
Fluency is a judgment of this ease of communication and is an important
criterion when evaluating speaking.
Suggestions for Improvement
Offer suggestions (rather than criticisms) for improved delivery
style. Many students are aware of their difficulties in delivering oral
communication and want feedback and support, and they do want
suggestions. Not so useful: "Don't wave your hands when you
talk."Better: "Let's figure out what you're going to do with your hands so
that you don't distract the audience from what you are saying. What feels
more natural to you?"
Present oral communication skills as a set of professional skills
that all professionals learn and practice steadily throughout their lives.
UNIT-6:
PORTFOLIOS
6.1
PURPOSE OF PORTFOLIOS:
Literally Definition:
C)
parent reflection; and provide for continuity in education from one year
to the next. Instructors can use them for a variety of specific purposes,
including:
goals
To document achievement for alternative credit
To accumulate "best work" for admission to other educational
institutions or program
Demonstrating progress toward identified outcomes.
Creating an intersection for instruction and assessment. ,
Providing a way for students to value themselves as learners.
Offering opportunities for peer-supported growth.
Benefits of Portfolio:
1.
First, you must decide the purpose of your portfolio. For example,
the portfolios might be used to show student growth, to identify
2.
3.
2.
Decide How ' You Will You Grade it. Next, we will need to
establish how we are going to grade the portfolio. There are
several ways you can grade students work, we can use a rubric,
letter grade, or the most efficient way would be to use a rating
scale. Is the work completed correctly and completely? Can we
comprehend it? we can use the grading scale of 4-1. 4 = Meets
all Expectations, 3 = Meets Most Expectations, 2 = Meets Some
Expectation, 1 = Meets No Expectations. Determine what skills
you will be evaluating then use the rating scale to establish a
grade.
3.
5.
Growth Portfolio
Showcase Portfolios
4)
Process Portfolios
may discuss why a particular strategy was used, what was useful or
ineffective for the individual in the writing process, and how the student
went about making progress in the face of difficulty in meeting
requirements. A process reflection typically focuses on many aspects of
the learning process, including the following: what approaches fiches
work best, which are ineffective, information about oneself as a learner,
and strategies or approaches to remember in future assignments.
5)
Evaluation Portfolios.
6)
Online or e-portfolios
Portfolio:
An organized presentation of an individuals education, work
samples, and skills.
Terminologically a portfolio is a purposeful collection of student
work that exhibits the students efforts, progress, and achievements in
one or more areas of the curriculum.
Guidelines:
Identify purpose
Select objectives.
Set the criteria for judging the work (rating scales, rubrics,
checklists) and make such student understand the criteria.
Selecting:
Portfolio:
Literally the word Portfolio is used in the following meanings:
1.
2.
3.
4.
2.
4.
5.
6.
7.
2.
3.
4.
5.
6.
7.
8.
9.
Share the criteria that will be used to assess the work in the
portfolio as well as in which the result are to be used. Teachers
should give feedback to students, parents about the use the
portfolio.
6.5
2.
3.
4.
5.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
toward
learning
36.
2.
3.
4.
5.
6.
7.
8.
9.
If goals and criteria are not clear, the portfolio can be just a
miscellaneous collection of artifacts that don't show patterns of
growth or achievement.
11.
2.
3.
4.
5.
6.
2.
3.
4.
5.
6.
7.
If goals and criteria are not clear, the portfolio can be just a
miscellaneous collection of artifacts that dont show patterns of
growth or achievement.
8.
6.6
EVALUATION OF PORTFOLIO:
UNIT-7:
BASIC CONCEPTS OF INFERENTIAL
STATISTS
7.1
Introduction:
The role and importance of statistics in education cannot be
denied. In education we come across with measurement, evaluation and
research. Similarly, we have to make educational policies and budgets. In
all these fields we need to make proper measurement and present the data
quantitatively. Thus without statistics we cannot make proper
measurement. As quoted in different statistics books "Planning is the
order of the day, and planning without statistics is inconceivable". Good
statistics and sound statistical analysis assist in providing the basis for the
design of educational policies, monitor policy implications and evaluate
policy impact. To generate reliable and relevant information the data
should be collected using appropriate statistical methods. The materials
one uses for data collection should be well designed. The data analysis
should also be done using appropriate statistical method. All these show
that statistics plays vital role in Education Management and educational
planning.
Concept of Inferential Statistics
Definition:
The branch of statistics concerned with using sample data to
make an inference about a larger group of data is called inferential
statistics.
Example:
For instance the college teacher decides to use the average grade
achieved by one statistics class to estimate the average grade of all the
SAMPLING ERROR:
that population. Since the sample does not include all members of the
population, statistics on the sample, such as mean and quantities,
generally differ from parameters on the entire population.
For example:
If one measures the height of a thousand individuals from a
country of one million, the average height of the thousand is typically not
the same as the average height of all one million people in the country.
Since sampling is typically done to determine the characteristics of a
whole population, the difference between the sample and population
values is considered a sampling error.
Population and Samples:
A population is the entire group to which we want to generalize
our results. A sample if a subset of the population might be all adult
humans but our sample might be a group of 30 friends and relatives.
Types of sampling errors:
1.
2.
3.
1.
2.
Random sampling
Bias problems
Non-sampling error
Random Sampling:
In statistics, sampling error is the error caused by observing a
sampling instead of the whole population. The sampling error
can be found by subtracting the value of a parameter from the
value of a statistic. In nursing research, a sample error is the
difference between sample statistics used to estimate a
population parameter and the actual but unknown value of the
parameter.
(Bunns and Grove, 2009)
Parameters and statistics:
A numerical summary of a population is called a parameter,
while the same numerical summary of a sample is called a
statistic.
Bias Problems:
Sampling bias is a possible source of sampling errors. It leads to
the sampling error which either have a prevalence to be positive
3.
4.
1.
2.
3.
4.
5.
7.3
> 22
2 # 25
1 = 2
mean 2)
1 - 2 = 0
Null Hypothesis:
The hypothesis to be tested in a test of hypothesis is called null
hypothesis. It is a hypothesis which is tested for possible rejection or
mollification under the assumption that it is true. It is denoted by H 0 and
usually contains and equal sign.
For example if we want to test that the population mean is 80,
then we write.
H0 : = 80
Another definition of Null-Hypothesis:
Null hypothesis is a type of hypothesis used in statistics that
proposes that no statistical significance exists in a set of given
observations.
The null hypothesis attempts to show that no variation exists
between variables, or that single variable is no different than ero. It is
presumed to be true until statistical evidence nullifies it for an alternative
hypothesis.
Examples:
Hypothesis:
The loss of my socks is due to alien burglary. (Alien burglary
means unfamiliar theft).
Null Hypothesis:
The loss of my socks is nothing to do with alien burglary.
Alternative Hypothesis:
TESTS OF SIGNIFICANCE:
LEVELS OF SIGNIFICANCE:
Statistical Errors
Even in the best research project, there is always a possibility
that the researcher will make a mistake regarding the relationship
between the two variables. This mistake is called statistical error.
In statistical test theory the notion of statistical error is an
integral part of hypothesis testing. The test requires an unambiguous
statement of a null hypothesis, which usually corresponds to a default
"state of nature", for example "this person is healthy", "this accused is
not guilty" or "this product is not broken". An alternative hypothesis is
the negation of null hypothesis, for example, "this person is not healthy",
"this accused is guilty" or "this product is broken". The result of the test
may be negative, relative to null hypothesis (not healthy, guilty, broken)
or positive (healthy, not guilty, not broken). If the result of the test
corresponds with reality, then a correct decision has been made.
However, if the result of the test does not correspond with reality, then an
error has occurred. Due to the statistical nature of a test, the result is
never, except in very rare cases, free of error. Two types of error are
distinguished: type I error and type II error.
In statistics, a type I error (or error of the first kind) is the
incorrect rejection of a true null hypothesis. A type 11 error (or error of
the second kind) is the failure to reject a -false. null hypothesis. A type I
error is a false positive. Usually a type I error leads one to conclude that a
thing or relationship exists when really it doesn't, for example, that a
patient has a disease being tested for when really the patient does not
have the disease, or that a medical treatment cures a disease when really
it doesn't. A type II error is a false negative. Examples of type II errors
would be a blood test failing to detect the disease it was designed to
detect, in a patient who really has the disease; or a clinical trial of a
medical treatment failing to show that the treatment works when really it
does. When comparing two means, concluding the means were different
when in reality they were not different would be a Type I error;
concluding the means were not different when in reality they were
different would be a 'Type II error.
All statistical hypothesis tests have a probability of making type
I and type II errors. For example, all blood tests for a disease will falsely
detect the disease in some proportion of people who don't have it, and
will fail to detect the disease in some proportion of people who do have
it. A test's probability of making a type I error is denoted by a. A test's
probability of making a type II error is denoted by .
The detail is given below:
Type-I Error:
The first is called a Type I error. This occurs when the
researcher assumes that a relationship exists when in fact the evidence is
that it does not. In a Type 1 error, the researcher should accept the null
hypothesis and reject the research hypothesis, but the opposite occurs.
The probability of committing a Type I error is called alpha (a).
A type I error, also known as an error of the first kind, occurs
when the null hypothesis (H0) is true, but is rejected. It is asserting
something that is absent, a false hit. A type I error may be compared with
a so-called false positive (a result that indicates that a given condition is
present when it actually is not present) in tests where a single condition is
tested for. Type I errors are philosophically a focus of skepticism and
Occam's razor. A Type I error occurs when we believe a falsehood. In
terms of folk tales, an investigator may be "crying wolf' without a wolf in
sight (raising a false alarm) (Ho: no wolf).
The rate of the type I error is called the size of the test and
denoted by the Greek letter a (alpha). -It usually equals the significance
level of a test. In the case of a simple null hypothesis a is the probability
of a type I error. If the null hypothesis is composite, a is the maximum
(supremum) of the possible probabilities of a type I error.
Explanation:
A Type I Error is also known as a False Positive or Alpha Error.
This happens when you reject the Null Hypothesis even if it is true. The
Null Hypothesis is simply a statement that is the opposite of your
hypothesis. For example, you think that boys are better in arithmetic than
girls. Your null hypothesis would be: "Boys are not better than girls in
arithmetic."
You will make a Type I Error if you conclude that boys are
better than girls in arithmetic when in reality, there is no difference in
how boys and girls perform. In this case, you should accept the null
hypothesis since there is no real difference between the two groups when
it comes to arithmetic ability. If you reject the null hypothesis and say
that one group is better, then you are making a Type I Error.
Type-II Error
The second is called a Type II error. This occurs when the
researcher assumes that a relationship does not exist when in fact the
evidence is that it does. In a Type II error, the researcher should reject the
null hypothesis and accept the research hypothesis, but the opposite
occurs. The probability of committing a Type II error is called beta.
Generally, reducing the possibility of committing a Type I error
increases the possibility of committing a Type II error and vice versa,
reducing the possibility of committing a Type II error increases the
possibility of committing a Type I error.
Researchers generally try to minimize Type I errors, because
when a researcher assumes a relationship exists when one really does not,
things may be worse off than before. In Type II errors, the researcher
(fail to prove false) a null hypothesis, but never prove it true (i.e., failing
to reject a null hypothesis does not prove it true).
Explanation:
A Type II Error is also known as a False Negative or Beta Error.
This happens when you accept the Null Hypothesis when you should in
fact reject it. The Null Hypothesis is simply a statement that is the
opposite of your hypothesis. For example, you think that dog owners are
friendlier than cat owners. Your null hypothesis would be: "Dog owners
are as friendly as cat owners."
You will make a Type II Error if dog owners are actually
friendlier than cat owners, and yet you conclude that both kinds of pet
owners have the same level of friendliness. In this case, you should reject
the null hypothesis since there is a real difference in friendliness between
the two groups. If you accept the null hypothesis and say that both types
of pet owners are equally friendly, then you are making a Type II Error.
7.7
DEGREES OF FREEDOM:
A Few Examples
For a moment suppose that we know the mean of data is 25 and
that the values are 20,10, 50, and one unknown value. To find the mean
of a list of data, we add all of the data and divide by the total number of
values. This gives us the formula (20 + 10 + 50 + x)/4 = 25, where x
denotes the unknown. Despite c ling this unknown, we can use some
algebra to determine that x = 20.
value is fully determined by the parameter (in t its case the mean) and the
other scores. Each parameter that is fixed during our computations
constitutes the loss of a degree of freedom.
If we imagine starting with a small number of data points and
then fixing a relatively large number of parameter: as we compute some
statistic, we see that as more degrees of freedom are lost, fewer and
fewer different situations are accounted for by our model since fewer and
fewer pieces of information could in principle be different from what is
actually observed.
So, the interest, to put it very informally, in our data is
determined by the degrees of freedom: if there is nothing that can vary
once our parameter is fixed (because we have so very few data points maybe just or e) then there is nothing to investigate. Degrees of freedom
can be seen as linking sample size to explanatory power.
The Standard Deviation is a measure of how spread out numbers
are;
Its symbol is a (the greek letter sigma)
The formula is easy: It is the square root of the Variance.
To calculate the variance follow these steps:
Work out the Mean (the simple average of the numbers)
Then for each number: subtract the Mean and square the resIult
(the squared difference).
Then work out the average of those squared differences.
Let suppose we have five values i.e 600,470,170,430 & 300
Mean =
= 394
1970
5
2
Variance: =n
94
2
2062 + 762 +(224 )2+36 2+
42,436+5,776+50,176+1,296+ 8,836
5
108,520
5
2
Variance =
= 21,704
94
2
2062 + 762 +(224 )2+36 2+
42,436+5,776+50,176+1,296+ 8,836
5
108,520
5
= 21,704
UNIT-8:
SELECTED TESTS OF SIGNIFICANCE
8.1
T-TEST:
Definition:
i)
ii)
iii)
The t-tests statistical significance and the t-tests effect size are
the two primary outputs of the t-test. Statistical significance indicates
weather the difference between sample averages is likely to represent an
actual difference between population and the effect size indicates wither
that difference is large enough to be practically meaningful.
The One sample t-test is similar to the independent samples
t-test except it is used to compare one groups average value to a single
number .x. for practical purposes you can look at the confidence interval
around the average value to gain this same information.
( X T X c )=
var T var c
+
nc
nc
T=
T X C
X
nar t nar c
+
nT
nC
References
OMahony, Michael (1986). Sensory Evaluation of Food: Statistical
Methods and procedures.
William H.; Saul A. Teukolsky. William T. Vetterling Br Ain P. Flannery
(1992). Numerical Recipes in C: The Art of \Scientific
Computing.
Internet Google, pre Encyclopedia.
8.2
CHI-SQUARE (X2):
2.
3.
4.
5.
6.
7.
8.
Uses of X2 Distribution:
1.
2.
3.
4.
5.
6.
2 /ei
oiei
u
(npi)2
=
npi
i=1
K
x2
i=1
B1
B2
To
A1
O11
O12
(A
A2
O21
O22
(A
Total
(B1)
(B2)
eij
=
( Ai)
eij
i=1
4
(Ai )
(Bj )
So that
n
i=1
Eij =
( Ai ) (Bj)
n
i=1
j=1
X = ( oijeij )
2
1 /eij
hypothesis
of
H0:
The two variables of classification are
independent OR
There is no relationship / Association between
the two variables.
H1:
ii)
iii)
i=1
j=1
X = ( oijeij )
2
2 /eij
eij=
( Ai )( Bj )
n
v)
iv)
Decide as below:
(i)
(ii)
Accept H0 if
X2> X2 (r-1) (c-1)
References
1.
2.
8.3
REGRESSION:
Linear regression
(ii)
Multiple regression.
B3X3 + Bt Xt u
Where:
Y
the intercept
the slope