You are on page 1of 12

Z value A common statistical way of standardizing data on one scale so a comparison can take place is using a z-score.

The z-score is like a common yard stick for all types of data. Each z-score corresponds to a point in a normal distribution and as such is sometimes called a normal deviate since a z-score will describe how much a point deviates from a mean or specification point.
Deviation IQ (DIQ): An age-based index of general mental ability. It is based on the difference between a persons score and the average score for persons of the same chronological age. Deviation Score (x): The score for an individual minus the mean score for the group; i.e., the amount a person deviates from the mean .

Percentile: A point on the norms distribution below which a certain percentage of the scores fall. For example, if 70% of the scores fall below a raw score of 56, then the score of 56 is at the 70th percentile. Percentile Rank: The percentage of scores falling below a certain point on a score distribution. (Percentile and percentile rank are sometimes used interchangeably.) Raw Score: A persons observed score on a test, i.e., the number correct. While raw scores do have some usefulness, they should not be used to make comparisons between performance on different tests, unless other information about the characteristics of the tests is known. For example, if a student answered 24 items correctly on a reading test, and 40 items correctly on a mathematics test, we should not assume that he or she did better on the mathematics test than on the reading measure. Perhaps the reading test consisted of 35 items and the arithmetic test consisted of 80 items. Given this additional information we might conclude that the student did better on the reading test (24/35 as compared with 40/80). How well did the student do in relation to other students who took the test in reading? We cannot address this question until we know how well the class as a whole did on the reading test. Twenty-four items answered correctly is impressive, but if the average (mean) score attained by the class was 33, the students score of 24 takes on a different meaning. Standard Score: A general term referring to scores that have been "transformed" for reasons of convenience, comparability, ease of interpretation, etc. The basic type of standard score, known as a z-score, is an expression of the deviation of a score from the mean score of the group in relation to the standard deviation of the scores of the group. Most other standard scores are linear transformations of z-scores, with different means and standard deviations T-Score: A standard score with a mean of 50 and a standard deviation of 10. Thus a T-score of 60 represents a score one standard deviation above the mean. T-scores are obtained by the following formula: Standard Deviation (S.D.) A measure of the variability, or dispersion, of a distribution of scores. The more the scores cluster around the mean, the smaller the standard deviation. In a normal distribution of scores, 68.3% of the scores are within the range of one S.D. below the mean to one S.D. above the mean. Computation of the S.D. is based upon the square of the deviation of each score from the mean. One way of writing the formula is as follows:

z-Score: A type of standard score with a mean of zero and a standard deviation of one Norms: The distribution of test scores of some specified group called the norm group. For example, this may be a national sample of all fourth graders, a national sample of all fourth-grade males, or

perhaps all fourth graders in some local district. Norms vs. Standards: Norms are not standards. Norms are indicators of what students of similar characteristics did when confronted with the same test items as those taken by students in the norms group. Standards, on the other hand, are arbitrary judgments of what students should be able to do, given a set of test items. Normal Distribution: A distribution of scores or other measures that in graphic form has a distinctive bell-shaped appearance. In a normal distribution, the measures are distributed symmetrically about the mean. Cases are concentrated near the mean and decrease in frequency, according to a precise mathematical equation, the farther one departs from the mean. The assumption that many mental and psychological characteristics are distributed normally has been very useful in test development work. Figure 1 below is a normal distribution. It shows the percentage of cases between different scores as expressed in standard deviation units. For example, about 34% of the scores fall between the mean and one standard deviation above the mean.

Figure 1. A Normal Distribution.

Mean ( ): The arithmetic average of a set of scores. It is found by adding all the scores in the distribution and dividing by the total number of scores. Frequency: The number of times a given score (or a set of scores in an interval grouping) occurs in a distribution. Frequency Distribution: A tabulation of scores from low to high or high to low showing the number of individuals who obtain each score or fall within each score interval.

Introduction
Of all the problems of building our judgements about people the last point is arguably the easiest to control. Through using objective assessment techniques (after evaluation of the psychometric qualities of the tools) we can make judgements confident that the results are going to give a measure of the consistency of an individuals behaviour (based on the reliability of the tool and validity of its predictions) and its uniqueness (based on the comparison of the result to appropriate groups). This does not deny the usefulness of less structured information, rather it suggests that as psychologists we too can fall into the trap of weighting anecdotal or colourful information

over more valid baserate data (Ginosar & Trope, 1980; Hamil et al., 1980; Taylor & Thompson, 1982). The normal distribution provides us with the tool to compare a standardised sample of behaviour with a large sample of that behaviour to give us a measure of relative propensity. The validity coefficient, as an index of the relationship between our sample of behaviour and the behaviour we are trying to predict, gives us the justification for this comparison. As we compare a persons aptitudes and attributes in relation to other people, scores on tests and questionnaires also need to be compared to relevant comparison groups. We do this through seeing how a persons score sits in relation to others scores on a normal distribution, or bell curve. Norms are sets of data derived from groups of individuals, who have already completed a test or questionnaire. These norm groups enable us to establish where an individuals score lies on a standard scale, by comparing that score with that of other people. Within the field of occupational testing, a wide variety of individuals are assessed for a broad range of different jobs. Clearly, people vary markedly in their abilities and qualities, and therefore the norm group against which an individual is compared is of crucial importance. It is very likely that the conclusions reached will vary considerably when an individual is compared against two different groups; for instance, school leavers and managers in industry. For this reason, it is important to ensure that the norm groups used are relevant to the given group or situation that the data are being used for.

Norming Systems
There are a number of different norming systems available for use, which have strengths and weaknesses in different situations. These can be grouped into two main categories; rank order and ordinal.

Rank Order Systems


When a group of people are given a test or questionnaire we expect to observe a range of different scores as people differ in their abilities or personal qualities. This spread of results allows us to arrange people in a rank order scale according to their performance. When an individuals score is subsequently compared with this scale, we can give a percentile score which represents the percentage of the comparison group that the individual has scored above. For instance, a score corresponding to the 75th percentile means that an individual score or response is greater in magnitude than 75% of the norm group in question. Someone who has scored at the 30th percentile has performed or responded in a way that is higher than 30% of the norm group. The 50th percentile is equivalent to the average of the scale. Percentiles have the disadvantage that they are not equal units of measurement. For instance, a difference of 5 percentile points between two individuals scores will have a different meaning depending on its position on the percentile scale, as the scale tends to exaggerate differences near the mean and collapse differences at the extremes. Accordingly percentiles must not be averaged nor treated in any other fashion mathematically. However, they have the advantage that they are easily understood and can be very useful when giving feedback of results to candidates or discussing results with managers.

Ordinal Systems
To overcome the problems of interpretation implicit with rank order systems, various types of standard scores have been developed. The basis of standard scores is the Z-score which is based on the mean and standard deviation. It indicates how many standard deviations above or below the mean a score is. A Z-score is merely a raw score which has been changed to standard deviation units.

The Z-score is calculated by the formula:

where: Z = standard score X = individual raw score = mean score SD = standard deviation Usually when standard scores are used they are interpreted in relation to the normal distribution curve. It can be seen from Figure 1 that Z-scores in standard deviation units are marked out on either side of the mean. Those above the mean are positive, and those below the mean negative in sign. By the calculation of the Z-score it can be seen where the individuals score lies in relation to the rest of the distribution. The standard score is very important when comparing scores from different scales within the questionnaire. Before these scores can be properly compared they must be converted to a common scale such as a standard score scale. These can then be used to express an individuals score on different scales in terms of norms. One important advantage in using the normal distribution as a basis for norms is that the standard deviation has a precise relationship with the area under the curve. One standard deviation above and below the mean includes approximately 68% of the sample. From Figure 1 it can be seen that Z-scores can be rather difficult and cumbersome to handle because most of them are decimals and half of them can be expected to be negative. To remedy these drawbacks various transformed standard score systems have been derived. These simply entail multiplying the obtained Z-score by a new standard deviation and adding it to a new mean. Both of these steps are devised to eradicate decimals and negative numbers.

T Scores (Transformed Scores)


The T-score is a linear transformation of the Z-score, based on a mean of 50 and standard deviation of 10. T-scores have the advantage over Z-scores that they do not contain decimal points nor positive and negative signs. For this reason they are used more frequently than Zscores as a norm system, particularly for aptitude tests. A T-score can be calculated from a Zscore using the formula: T = (Zx10) + 50

Stens (Standard Tens)


The Sten (standard ten) is a standard score system commonly used with personality questionnaires. It is based on a transformation from the Z-score and has a mean of 5.5 and a standard deviation of 2. Sten scores can be calculated from Z-scores using the formula: Sten = (Zx2) + 5.5. As the name suggests, stens divide the score scale into ten units. Each unit has a band width of half a standard deviation except the highest unit (Sten 10) which extends from 2 standard deviations above the mean, and the lowest unit (Sten 1) which extends from 2 standard deviations below the mean. Stens have the advantage that they are based on the principles of standard scores and that they

encourage us to think in terms of bands of scores, rather than absolute points. With stens these bands are sufficiently narrow not to mark significant differences between people, or for one person over different personality scales, while at the same time guiding the user not to over-interpret small differences between scores. The relationship between Stens, T-Scores and Percentiles is shown in the chart of the normal distribution curve (Figure 1).

Choosing Norm Groups


The importance of New Zealand based norms lies in two places; the face validity and construct validity. When providing feedback to a respondent, giving the information in the context of a group they feel they should be compared with is very important for the persons acceptance of the results. The results will be less meaningful when compared to an inappropriate group. Results from our research to date suggests some very real differences between New Zealand, Australia, and the United Kingdom in the areas of personality and aptitudes. Norm group size should be determined by the standard deviation of the sample and the reliability of the instrument. A rough rule of thumb is that for instruments which meet the gold standard of reliability (.75), norm groups should be made up of 100 people or more and should be directly relevant to the purpose of the test. As discussed previously, the purpose of norms is to provide baserate information about the likelihood of a skill or behaviour being displayed by the individual. In order for baserate information to be meaningful the comparison group needs to be as similar to the individual tested and the circumstances in which they were measured. For example, comparing the score of a degreed manager with 16 year old school leavers is unlikely to give useful base rate information as to the ability to solve problems. In-house norms provide the most directly relevant base rate information as the recipient of the information will be very familiar with the prevalence of the behaviour being measured. The disadvantage of norms produced in-house is that the comparability with the whole population is lost. For example, are the people in your own institution more or less disabled than the whole population, are your applicants better or worse than those of other companies.

Standard scores indicate where your score lies in comparison to a norm group. For example, if the average or mean score for the norm group is 25, then your own score can be compared to this to see if you are above or below this average. Percentile Scores A percentile score is another type of converted score. Your raw score is converted to a number indicating the percentage of the norm group who scored below you. For example, a score at the 60th percentile means that the individual's score is the same as or higher than the scores of 60% of those who took the test. The 50th percentile is known as the median and represents the middle score of the distribution.

Percentiles have the disadvantage that they are not equal units of measurement. For instance, a difference of 5 percentile points between two individuals scores will have a different meaning depending on its position on the percentile scale, as the scale tends to exaggerate differences near the mean and collapse differences at the extremes. Percentiles can not be averaged nor treated in any other way mathematically. However, they do have the advantage of being easily understood and can be very useful when giving feedback to candidates or reporting results to managers. If you know your percentile score then you know how it compares with others in the norm group. For example, if you scored at the 70th percentile, then this means that you scored the same or better than 70% of the individuals in the norm group. This is the score most often used by organizations when comparing your score with that of other candidates because they are so easily understood they are very widely used when reporting results to managers. The characteristic way that test scores tend to bunch up around the average and the use of percentiles in the interpretation of test results, has important implications for you as a job candidate. This is because most aptitude tests have relatively few questions and most of the scores are clustered around the mean. The effect of this is that a very small improvement in your actual score will make a very substantial difference to your percentile score. To illustrate this point, consider a typical aptitude test consisting of 50 questions. Most of the candidates, who are a fairly similar group in terms of their educational background and achievements, will score around 40. Some will score a few less and some a few more. It is very unlikely that any of them will score less than 35 or more than 45.

Looking at these results in terms of percentiles is a very poor way of analyzing them and no experienced statistician would ever use percentiles on this type of data. However, nine times

out of ten this is exactly what happens to these test results and a difference of three or four extra marks can take you from the 30th to the 70th percentile. This is why preparing for these tests is so worthwhile as even small improvements in your results can make you appear a far superior candidate.

Different Norming Systems There are several different norming systems available for use, which have strengths and weaknesses in different situations. These can be grouped into two main categories; rank order and ordinal. Z-scores To overcome the problems of interpretation implicit with percentiles and other rank order systems various types of standard scores have been developed. One of these is the Z-score which is based on the mean and standard deviation. It indicates how many standard deviations above or below the mean your score is. The Z-score is calculated by the formula: Z=X-M/SD Where:

Z X M SD

= = = =

standard score individual raw score mean score standard deviation

The illustration shows how Z-scores in standard deviation units are marked out on either side of the mean. It shows where your score sits in relation to the rest of the norm group. If it is above the mean then it is positive, and if it is below the mean then it is negative. As you can see from the illustration, Z-scores can be rather cumbersome to handle because most of them are decimals and half of them can be expected to be negative. T Scores (Transformed Scores) T-scores are used to solve this problem of decimals and negative numbers. The T-score is simply a transformation of the Z-score, based on a mean of 50 and standard deviation of 10. A T-score can be calculated from a Z-score using the formula: T = (Zx10) + 50

Since T-scores do not contain decimal points or negative signs they are used more frequently than Z-scores as a norm system, particularly for aptitude tests. Stens (Standard Tens) The Sten (standard ten) is a standard score system commonly used with personality questionnaires. Stens divide the score scale into ten units. Each unit has a band width of half a standard deviation except the highest unit (Sten 10) which extends from 2 standard deviations above the mean, and the lowest unit (Sten 1) which extends from 2 standard deviations below the mean.

Sten scores can be calculated from Z-scores using the formula: Sten = (Zx2) + 5.5. Stens have the advantage that they enable results to be thought of in terms of bands of scores, rather than absolute scores. These bands are narrow enough to distinguish statistically significant differences between candidates, but wide enough not to over emphasize minor differences between candidates.

Whenever you take a psychometric test either as part of the selection process or as a practice exercise you will usually see your results presented in terms of numerical scores. These may be; raw scores, standard scores, percentile scores, Z-scores, T-scores or Stens.

Raw Scores These refer to your unadjusted score. For example, the number of items answered correctly in an aptitude or ability test. Some types of assessment tools, such as personality questionnaires, have no right or wrong answers and in this case, the raw score may represent the number of positive responses for a particular personality trait. Obviously, raw scores by themselves are not very useful. If you are told that you scored 40 out of 50 in a verbal aptitude test, this is largely meaningless unless you know where your particular score lies within the context of the scores of other people. Raw scores need to be converted into standard scores or percentiles will provide you with this kind of information. How Scores are Distributed Many human characteristics are distributed throughout the population in a pattern known as the normal curve or bell curve. This curve describes a distribution where most individuals cluster near the average and progressively fewer individuals are found the further from the average you go in each direction.

The illustration above shows the relative heights of a large group of people. As you can see, a large number of individual cases cluster in the middle of the curve and as the extremes are approached, fewer and fewer cases exist, indicating that progressively fewer individuals are very short or very tall. The results of aptitude and ability tests also show this normal distribution if a large and representative sample of the population is used. Mean and Standard Deviation There are two characteristics of a normal distribution that you need to understand. The first is the mean or average and the second is standard deviation, which is a measure of the variability of the distribution. Test publishers usually assign an arbitrary number to represent the mean standard score when they convert from raw scores to standard scores. Test X and Test Y are two tests with different standard score means.

In this illustration Test X has a mean of 200 and Test Y has a mean of 100. If an individual got a score of 100 on Test X, that person did very poorly. However, a score of 100 on Test Y would be an average score. Standard Deviation. The standard deviation is the most commonly used measure of variability. It is used to describe the distribution of scores around the mean.

The value of the standard deviation varies directly with the spread of the test scores. If the spread is large, the standard deviation is large. One standard deviation of the mean (both the plus and minus) will include 66% of the students' scores. Two standard deviations will include 95% of the scores.

What do test scores mean? Many people will remember test scores from their school days such as 7 out of 10 for a primary school spelling test, or 63% for one of their secondary school exams. Such scores are readily understandable and are useful in indicating what proportion of the total marks a person has gained, but these scores do not account for factors such as how hard the test is, where a person stands in relation to other people, and the margin of error in the test score. As another example, in a school test such as mathematics or English, we would not know how well the pupil is performing against National Curriculum measures.

Many professionally produced tests, including most of those constructed by NFER, give outcomes that are different from simple proportions or percentages. The following types of score or measure account for many of the outcomes of educational or psychometric tests: Standardised scores Standardised scores and percentile ranks are directly related. Both enable test-takers to be compared with a large, nationally representative sample that has taken the test prior to publication. The standardised score is on a scale that can be readily compared and combined with standardised scores from other tests; the percentile rank gives a rank ordering of that score based on the population as a whole. Standardised scores Standardised scores are more useful measures than raw scores (the number of questions answered correctly) and there are three reasons why such scores are normally used. 1) In order to place test takers' scores on a readily understandable scale One way to make a test score such as 43 out of 60 more readily understandable would be to convert it to a percentage (72 per cent to the nearest whole number). However, the percentage on its own is not related to (a) the average score of all the test-takers, or (b) how spread out their scores are. On the other hand, standardised scores are related to both these statistics. Usually, tests are standardised so that the average, nationally standardised score automatically comes out as 100, irrespective of the difficulty of the test, and so it is easy to see whether a test-taker is above or below the national average. The measure of the spread of scores is called the 'standard deviation' and this is usually set to 15 for educational attainment and ability tests, and for many occupational tests. This means that, irrespective of the difficulty of the test, about 68 per cent of the test-takers in the national sample will have a standardised score within 15 points of the average (between 85 and 115), and about 96 per cent will have a standardised score within two standard deviations (30 points) of the average (between 70 and 130). These examples come from a frequency distribution known as 'the normal distribution', which is shown in the figure below.

2) In educational tests, so that an allowance can be made for the different ages of the pupils In a typical class in England and Wales, it is usual that most pupils are born between 1st September in one year and 31st August of the following year, which means that the oldest pupils are very nearly 12 months older than the youngest. Almost invariably in ability tests taken in the primary and early secondary years, older pupils achieve slightly higher raw scores than younger pupils. However, standardised scores are derived in such a way that the ages of the pupils are taken into account by comparing a pupil only with others of the same age (in years and months). An older pupil may in fact gain a higher raw score than a younger

pupil, but have a lower standardised score. This is because the older pupil is being compared with other older pupils in the reference group and has a lower performance relative to his or her own age group. 3) So that scores from more than one test can be meaningfully compared or added together Standardised scores from most educational tests cover the same range from 70 to 140. Hence a pupil's standing in, say, mathematics and English can be compared directly using standardised scores. Similarly, should a teacher wish to add together scores from more than one test, for example in order to obtain a simple overall measure of attainment, they can be meaningfully combined if standardised scores are used, whereas it is not meaningful to add together raw scores from tests of different length or difficulty. In occupational tests, the use of standardised scores enables the organisation to compare directly or add together sub-test scores or scores from different tests in a battery. Percentile Ranks Recording a test-taker's percentile rank enables his or her performance to be compared very clearly with those in the national standardisation sample. The percentile rank of a test-taker is defined as the percentage of test-takers in the sample who gained a score at the same level or below that of the test-taker's score. Performance at the 25th percentile, for example, indicates a standardised score that is as good as, or better than, the standardised scores of 25 per cent of the sample. This information may be useful when, for example, reporting school test scores to parents. There is, in fact, a fixed relationship between standardised scores and percentile ranks when the same average score and standard deviation are used. The table below shows the relationship for tests that employ an average standardised score of 100 and a standard deviation of 15.

Percentiles. Percentiles are probably the most commonly used test score in education. A percentile is a score that indicates the rank of the student compared to others (same age or same grade), using a hypothetical group of 100 students. A percentile of 25, for example, indicates that thestudent's test performance equals or exceeds 25 out of 100 students on the same measure; a percentile of 87 indicates that the student equals or surpasses 87 out of 100 (or 87% of) students. Note that this is not the same as a "percent"-a percentile of 87 does not mean that the student answered 87% of the questions correctly! Percentiles are derived from raw scores using the norms obtained from testing a large population when the test was first developed. Standard scores. A standard score is also derived from raw scores using the norming information gathered when the test was developed. Instead of reflecting a student's rank compared to others, standard scores indicate how far above or below the average (the "mean") an individual score falls, using a common scale, such as one with an "average" of 100. Standard scores also take "variance" into account, or the degree to which scores typically will deviate from the average score. Standard scores can be used to compare individuals from different grades or age groups because all scores are converted to the same numerical scale. Most intelligence tests and many achievement tests use some type of standard scores. For example, a standard score of 110 on a test with a mean of 100 indicates above average performance compared to the population of students for whom the test was developed and normed.

You might also like