You are on page 1of 7

Understanding the error of our ways: Mapping the

concepts of validity and reliability


Patricia A. Higgins, RN, PhD
Andrew J. Straub, ANP, GNP

Clinicians increasingly desire evidence upon which to


base their practice decisions. One of the difficulties in
their decision-making, however, is answering the fundamental question, How do I evaluate the relevance
and applicability of the findings? There are a number
of factors involved in such an evaluation and, frequently, readers can easily determine the usefulness
of a studys findings based on similarities to their own
clinical setting, timeframe, and/or patient population.
It may be more difficult, however, to understand and
evaluate a studys measurement error, or the reliability
(trustworthiness) and validity (truth) of its methods and
measurement strategies, in part because the extensive body of literature associated with validity and
reliability can be overwhelming. The purpose of this
article is to provide a comprehensive overview of
measurement error as it applies to research design
and instrumentation issues. It is intended to serve as a
succinct, practical reminder of the definitions and
relationships of the concepts of validity and reliability.
It is not intended to replace the essential, detailed
discussions found in numerous textbooks and journal
articles. The different dimensions of validity and reliability are briefly discussed and a concept map is
used to illustrate their relationships. In the process of
explaining or predicting the phenomena and/or processes of health care, researchers and clinicians must
be able to evaluate the truthfulness, precision, and
dependability of the instruments and measurement
methods used to generate the knowledge for evidence-based practice.

urses, like all those involved in health care research and practice, continuously grapple with
how to evaluate the importance of evidence produced from research investigations. Among the multiple issues to be considered, some may seem more
Patricia A. Higgins is an Assistant Professor at Frances Payne Bolton
School of Nursing, Case Western Reserve University, Cleveland, OH.
Andrew J. Straub is a Nurse Practitioner at Evercare, United Health
Care Services, Inc., Cleveland, OH.
Reprint requests: Dr. Patricia A. Higgins, Case Western Reserve University Nursing, 10900 Euclid Avenue, Cleveland, OH 44106-4904.
E-mail: Patricia.Higgins@case.edu
Nurs Outlook 2006;54:23-29.
0029-6554/06/$see front matter
Copyright 2006 Mosby, Inc. All rights reserved.
doi:10.1016/j.outlook.2004.12.004

straightforward than others; for example, the timeliness,


research setting, cultural relevance, and/or types of
subjects involved. On the other hand, evaluating the
methodology (design and instruments) that produced
those findings may not be as easily accessible, in part
because of the complexity of the terms and relationships used to describe the reliability (trustworthiness)
and validity (truth) of the studys measurement strategies.
Measurement error is defined as the difference between an abstract concept or idea, considered the true
state, and the observed measurement provided by an
empirical instrument.1 Measurement error can be described as either systematic or random. Although there
is overlap, the statistical methodology associated with
validity primarily focuses on reducing systematic error,
or the relationship between the true score and the
concept (or variable), while the methodology of reliability is concerned with minimizing random error, or
the relationship of the observed score to the variable.1,2
The relationship between validity and reliability is
mathematically illustrated through the classical test
theory equation, X t e, in which X is the observed
score, t is the true score, and e is error.3
The purpose of this article is to provide a comprehensive overview of measurement error as it applies to
research design and instrumentation issues. Although
this multifaceted approach is complicated, and the
discussion for each type of validity and reliability is
limited, we believe that it provides the most informative
concept map. By necessity, however, this article is
limited to domains, relationships, and statistical methods related to the most common types of validity and
reliability in quantitative research.
Uniting concept mapping with a modified form of
concept analysis seems an especially good fit, as they
both provide mechanisms to increase our understanding
of complex concepts. Concept analysis uses the literature to explain a complex mental abstraction, while
concept mapping provides the diagram that helps us
better visualize the concept and all its dimensions. As
one method of conveying an idea, concept mapping is
used to clarify the meaning of a concept, delineate its
domains and summarize relationships. As a strategy to
promote meaningful learning, it also has been used to
explain complex ideas and as a tool for individuals or

A N U A R Y

/ F

E B R U A R Y

U R S I N G

U T L O O K

23

Mapping Validity and Reliability

Higgins and Straub

groups to graphically represent their personal ideas of a


situation or event.4,5,6,7 A concept map is a hierarchical
structure that is read vertically, from top to bottom,
proceeding from the most abstract to the most concrete
concepts. Figure 1, the concept map of validity and
reliability, begins with the overarching concept of
measurement error and ends with statistical measurement strategies. Its organization parallels the text of the
article and the numbers (lowest level of the map)
correspond to numbers in the text and the legend.

may be described as statistical interaction, in which the


meaning of the concept is constant but the effect of an
intervention differs across populations, time, and locale.8 For instance, for the older adult, function may
consistently be defined according to activities of daily
living (ADL), but the effect of an intervention to
improve ADL function may vary depending on subjects age, medical diagnosis, culture and/or setting
(community outpatient clinic, acute care division, or
nursing home unit). Theoretical specification is crucial,
therefore, for construct validation3 and, ideally, construct validation occurs through accrual of evidence
from multiple studies that use congruent (but not
necessarily identical) theoretical frameworks.
With an understanding that theoretical specification
is a fundamental assumption of construct validity, there
has been an historical shift in the thinking of measurement theorists from a focus on test validity, which
emphasized corroborating a specific test or instrument,
to validation of inferences based on the test results.10
Which brings us to the second and third steps of
construct validation: specifying and empirically testing
the relationships (hypotheses) that involve a particular
concept, and interpreting the results to determine if the
measure/instrument validates the concept as defined by
the theoretical relationships. Construct validity is a
lengthy process that typically involves many years and
several different studies and measurement iterations,
with subsequent revisions of the concepts meaning and
its indicators. Its strongest results are produced in
situations where all the concepts of a theoretical framework have a degree of validity and reliability and there
is strength and consistency in all the measured relationships of the concepts.1 For nurses, several decades of
multidisciplinary research have produced a much richer
understanding of the construct validity of such multifaceted concepts as pain, social support, uncertainty,
and self-efficacy. Although the following 2 types of
validity, content and criterion, are sometimes discussed
separately from construct validity, they are presented
here as 2 of its dimensions representing steps 2 and 3
described above.
Content validity. Content validity is concerned with
the adequacy of the items (questions) of an instrument
to assess the concept, or domain of interest.9 The
processes of content validity are preceded by concept
analysis (domain identification) and a developmental
stage in which there is generation of an instrument.1,11
Content validity is an interpretation of the results of the
tool development, a critical review of the instruments
items in order to assess semantic clarity, domain sampling adequacy, and coherence of items. The evaluation
methods include (Figure 1, 1-3):

VALIDITY
Validity addresses the inference of truth of a set of
statements.8 The set consists of symbols that represent
mathematical formulae and/or words and statements
that comprise a line of reasoning in which some of the
statements support acceptance of others (conclusions).
In science, validity is essential to a research proposals
theoretical framework, design, and methodology, including how well specific tools or instruments measure
what they are intended to measure. As such, it is
considered an ideal state, to be pursued but never
obtained. Knowing that we will never achieve an
absolute outcome, scientists nevertheless engage in
continuous efforts to attain a degree of validity.9
Even a cursory reading of the literature quickly alerts
a reader to the many different, and sometimes confusing, discussions of validity. We have chosen the approach first used by Cook and Campbell8 and adapted
by Pedhazur and Schmelkin.3 In this approach, validity
is considered a unitary concept because its multiple
domains have common characteristics that are not
mutually exclusive.3 But similar to other abstract concepts, in which the whole is better understood through
a discussion of its particulars, separate discussions are
presented for its 2 primary domains, construct validity
and design validity. Additionally, each of these concepts is broken down into even more specific types of
validity; for example, criterion-related validity is examined as a subset of construct validity.

Construct Validity
Construct validity answers the question, Is the
concept accurately defined and does the instrument or
tool actually measure the concept that it is supposed to
measure? Understanding construct validity is challenging, but knowledge of 3 essential steps assists in
grasping its complexity: (1) specification of context, (2)
specifying and testing predicted relationships, and (3)
interpreting results. The first step is determining the
context or situation in which a construct or concept is
used. Researchers work to establish a degree of construct validity for a particular concept that is specific to
a theoretical framework.1 For example, if used within a
physiological framework, the concept of function is
defined differently than when it is used within a social
interaction framework. In experimental designs, context
24

O L U M E

5 4

U M B E R

U R S I N G

1. Literature review. Includes historical and current


uses of the concept/instrument.
2. Personal reflection.
O

U T L O O K

Mapping Validity and Reliability

Higgins and Straub

Figure 1. Concept Map

3. Analytical critique. (a) Analytical critique of the


instrument by experts (clinicians and researchers),
either individually or as a panel, in which both the
individual items and the entire instrument are
evaluated; (b) Analytical critique of the instrument
by potential subjects (focus groups).
Criterion-related validity. Criterion validity is concerned with the statistical testing of theoretical relationships within an instrument, between 2 instruments,
and/or an instrument and an event that occurs before,
during, or after an instrument is used to measure the
concept of interest.3 There are 4 types of criterion
validity (Figure 1, 4-7):
4. Concurrent validity. Ability to detect a positive or
negative statistical relationship between 2 instruments simultaneously measuring the same concept
at the same time. Reported as a correlation coefficient (r).
5. Convergent or divergent validity. Ability to detect
a relationship (r) between the concept of interest
and a concept that has a similar meaning (convergent), or the ability to detect the absence of a
significant relationship between the concept of
interest and one that has an opposite meaning
(divergent).
J

6. Predictive validity. Ability of the measure of a


concept to predict future phenomena or events
even if we cannot explain why the concept is
predictive. Time period for prediction must be
specified. Reported as a correlation coefficient (r).
7. Factor analysis. Ability of an instrument to operationalize a theoretical construct by determining
the relationships of a set of variables.9 Requires
strict adherence to theoretical guidelines for
interpretation.

Design Validity
Design validity is concerned with the evaluation of a
researchers design (blueprint) of a study.12 Given that
scientists seek to draw valid conclusions about the
effect of the independent variable on the dependent
variable and make valid generalizations about the
population and settings of interest, statistical as well as
non-statistical factors must be considered in their design deliberations.13 For the clinician, it can be just as
daunting to determine if research results are relevant;
that is, were the results obtained under circumstances
that are somewhat similar to clinical practice? It is
helpful, therefore, to know that knowledge of the
clinical domain of interest must be combined with 3
closely related factors. The first is concerned with the

A N U A R Y

/ F

E B R U A R Y

U R S I N G

U T L O O K

25

Mapping Validity and Reliability

Higgins and Straub

ethical treatment of human subjects; the second, the


features and constraints of the clinical setting; and the
third, the factors of design validity.
Design validity includes 3 broad categories: internal
validity, external validity, and statistical conclusion
validity.12 In some cases, statistical conclusion validity
is considered a special case of internal validity, and in
other approaches it is a separate category. It also differs
from other types of validity in that it includes random
error (reliability). To best illustrate its characteristics,
statistical conclusion validity is discussed here as a
third dimension of design validity.
The following descriptions of the 3 dimensions of
design validity are drawn from the work of Cook and
Campbell8 and Kirk,13 but discussions also are found in
other research texts; for example, Burns and Grove.12
Internal and external validity claims or threats primarily apply to experimental or quasi-experimental
designs; the claims of statistical conclusion validity are
relevant for all designs. As with construct validity, the
3 dimensions are overlapping. Further, some of the
claims of design validity overlap, or are based on,
reliability measures. For example, low reliability of the
measure(s) for any variable may create a threat to
internal validity; that is, low reliability of measures may
inflate the error variance and lead to Type II error
(concluding that there is no statistical difference between subject groups when there is an actual difference).8 And finally, strengthening one type of design
validity may weaken another. For example, the more
the researcher controls the effects of the treatment on
test subjects, the better the statistical conclusion validity; however, this greater control on the treatment can
also decrease external validity and construct validity.
Internal validity. Internal validity is concerned with
the congruence between the theoretical assertions and
the statistical relationship of 2 variables; that is, are 2
variables causally related and is the direction of their
relationship as hypothesized? Internal validity, most
frequently discussed in terms of threats, are those
factors that researchers are expected to anticipate and
control for during proposal writing (a priori) or attempt
to explain (a posteriori) when unexpected results materialize after completion of the data collection. Although
discussed singly, the threats may occur simultaneously
and in a cumulative or opposing manner. Based on the
purpose, setting, theory, and design of a study, the
investigator must decide the priority of validity claims.8
Similarly, the clinician must prioritize validity claims of
a particular study and decide if internal threats are
sufficient to reject its results as unsuitable (Figure 1,
8-18):

9. Maturation. Occurs in pre-test and post-test designs when changes in subjects affect the dependent variable.
10. Testing. Repeated testing alters a subjects performance through familiarity with the tool.
11. Regression toward the mean. In pre-test/post-test
and matched-group designs, the natural tendency
of subjects extreme scores to move toward the
mean, and thus, it cannot be concluded that the
treatment caused the change.
12. Differential selection of participants. Occurs if
there is systematic bias on the part of the researcher in recruiting subjects and/or assigning
them to groups.
13. Differential loss of participants. May occur during
recruitment and/or data collection.
14. Diffusion of treatment. Occurs when subjects in
the control group inadvertently learn information
intended only for the treatment group, or they
receive the treatment intended for the treatment
group.
15. Ambiguity about direction of causal relationships
among variables. Occurs most frequently in crosssectional designs.
16. Compensatory equalization of treatments. May be
ethically required and/or socially desired. Results
in the loss of the control group.
17. Compensatory rivalry of subjects. Competition
between treatment groups may occur if the research protocol is known and subjects learn of
their group membership (control or treatment).
18. Demoralization of control group subjects. Occurs
when the group receiving the less desirable treatment becomes discouraged or resentful.
External validity. External validity is concerned
with generalizing to particular people, settings and
times and across types of people, settings and times.
The former is concerned with whether research goals
for an identified population have been met; the latter,
for assessing how far one can generalize.8 The 4 factors
discussed here may also be referred to as threats, or
statistical interaction effects. One additional factor that
is not identified explicitly as a threat to external validity
is sampling bias, in the form of non-probability or
convenience sampling. It is used in many nursing study
designs and limits the generalizability of the findings in
the same way as the other threats to external validity
(Figure 1, 19-22).
19. Interaction of subject selection and treatment.
Sample bias occurs when a large number of
subjects inadvertently share a salient characteristic
or a large number of potential subjects decline to
participate. Either can affect the experiments
dependent variable.
20. Interaction of setting and treatment, also known as
ecological validity. Occurs when a treatment effect

8. History or concurrent events. Refers to events


unrelated to the research that occur during data
collection and may incorrectly contribute to a
relationship between 2 variables.
26

O L U M E

5 4

U M B E R

U R S I N G

U T L O O K

Mapping Validity and Reliability

Higgins and Straub

happens only in a specific setting, preventing


further generalization.
21. Interaction of history and treatment. The circumstance or times in which the study takes place are
temporary and unique and unlikely to be repeated.
22. Interaction of multiple treatments. Treatments,
other than the experimental one, affect the dependent variable. The treatments may be unknown or
unanticipated by the researcher.
Statistical conclusion validity. Statistical conclusion
validity is concerned with both systematic and random
error and the correct use of statistics and statistical
tests.8 It addresses the question, Is there a relationship
between two variables? The types listed below are
essential for experimental and quasi-experimental designs and most are important for descriptive designs
(Figure 1, 23-29):
23. Power. Low statistical power may lead to Type I
error: falsely concluding that a relationship or
difference exists between 2 subject groups.
24. Violation of assumptions of statistical tests. May
result in incorrect inferences.
25. Confounded significance test levels. With certain
statistical tests (for example, t-tests), the probability of drawing one or more erroneous conclusions
increases as a function of the number of tests
performed.
26. Reliability of measures. Includes a wide range of
instrumentation error, from poor calibration of a
physiological instrument to changes in an observers performance.
27. Reliability of treatment implementation. Failure to
standardize the treatment may lead to an increase
of either a Type I or Type II error.
28. Heterogeneity of the environmental setting. Unanticipated irrelevancies in the experimental setting
of the study may lead to Type II error.
29. Heterogeneity of respondents. Idiosyncratic differences in subjects may lead to Type II error.

RELIABILITY
Reliability in the most general sense is the extent to
which an experiment, test, or any measuring procedure
yields the same results on repeated trials.1 According
to classical measurement theory, it is the ratio of
true-score variance to observed-score variance. Similar
to validity, perfect reliability is an ideal state that is
never achieved. Additionally, there are a number of
issues pertaining to the measurement and/or use of the
term. First, for physiological measures, the term reliability is frequently referred to as accuracy and/or
precision. Second, reliability should be used only to
describe inferences from a particular samples data set,
test or instrument. Also, although the concept map
shows that reliability is multidimensional, with each
dimension reflecting a different conceptual perspective,
J

and although the dimensions overlap, a high degree of


reliability from one dimension (for example, stability)
does not indicate a high degree of reliability from
another (for example, internal consistency). Therefore,
authors who mistakenly refer to the high reliability of
an instrument, without further identifying the conceptual or statistical perspective, prohibit readers from
making an informed decision about the evidence presented.
The direct connection between the concepts of reliability and validity is illustrated through the understanding that valid measurements require consistency
of observation14 but, also, that reliability is considered
a necessary but not sufficient condition for validity.3 In
other words, it is possible for a data set to have high
reliability measures but low validity, but in order to
have a high degree of validity, the data set also must be
reliable. As an illustration, consider an electronic scale
used to measure patient weights. Suppose we are testing
the validity and reliability of this scale by repeatedly
placing a 100-lb barbell on the scale and recording the
reading each time. If the scale is to have a degree of
validity, it must record consistent readings at or very
near the true weight of 100 lbs each time. But
suppose the researcher mistakenly calibrated the empty
scale to 0.5 lb instead of 0 lbs and subsequent readings
are at or near 100.5 lbs each time; consequently, this
data set would have high reliability but low validity.
As will become evident from the discussion below,
the determination of reliability in human test subjects
presents additional challenges. The following measures
are all population-dependent indicators of reliability
(random error).
Internal Consistency Reliability. Internal consistency is a measure of the degree to which each of an
instruments items measure the same characteristic. To
compute internal consistency, a single version of an
instrument is administered to a single group of test
subjects at a single time point. The data are then
analyzed for consistency using 1 of the following
statistical methods1 (Figure 1, 33-35):
30. Cronbachs alpha. Also known as coefficient
alpha or alpha, Cronbachs alpha provides a
general estimate of how well all the items on a test
instrument measure the same phenomenon. It is
based on the number of test items and their
average inter-item correlations. The possible range
of scores for alpha 0.0-1.0, with lower scores
indicating less internal reliability. Estimates of an
acceptable alpha for an instrument are dependent
on the maturity of the instrument and the sample
population for which it was used. An alpha of
0.80 is recommended for well-established and
widely used instruments.1

A N U A R Y

/ F

E B R U A R Y

U R S I N G

U T L O O K

27

Mapping Validity and Reliability

Higgins and Straub

31. Kuder-Richardson Formulas. Similar to Cronbachs alpha but used with scale items that are
scored dichotomously.
32. Coefficient Theta. Reliability estimator computed
from principal component or factor analysis that
was developed to account for non-parallel or
multi-dimensionality in a set of items.15

least 2 different occasions. Pearsons correlation


coefficient is calculated.
35. Kendalls Coefficient of Concordance. Measures
the degree to which ranked data from numerous
raters agree with one another.
36. Percent Agreement. Ratio of the number of measurements in agreement divided by the total number of measurements.
37 Kappa and Phi. Computation of Cohens Kappa or
its equivalent, Phi, provides nearly equal inter-rater
reliability coefficients. Because they control for
chance agreement, they are preferable to the percent agreement statistic.16

Stability. Stability is the consistency of repeated


measurements. A sample of subjects is tested more than
once, under similar circumstances, utilizing the same
instrument, which can be either a physiological or paper
and pencil instrument. It is used with phenomena in
which little or no change is expected between the first
and second trials. The data sets from the 2 test administrations are statistically compared from one test to the
next. Assessing stability of measurement requires theoretical understanding of the concept of interest, the
time between measurement, and intervening factors. In
assessing a physiological measurement, for example, a
subjects body weight, measured at two 15-minute
intervals in an outpatient clinic, should be unchanged.
Even if appropriate timing can be determined, assessing
the stability of repeated measures of attitudes or psychological conditions can be more difficult, or even
inappropriate, considering the variability of subject
perceptions and/or the number of intervening factors
There is 1 common statistical method for testing stability1 (Figure 1, 33):

The Bland-Altman plot17 is a relatively new statistical methodology that increasingly is being used to
assess the equivalence between 2 instruments measuring the same phenomenon. Bland and Altman17 argue
against the common method of using correlation coefficients to compare one set of repeated measures to
another set of repeated measures on the basis that
correlations can lead to misleading conclusions about
the agreement between 2 variables. The Bland-Altman
method compares the measurements differences to the
mean of the measurements (in this analysis, the mean of
the measurements is assumed to be the true or correct
measurement against which both sets of experimental
measurements are compared). The statistical limits of
agreement are specified in advance of performing the
statistical analysis. Most frequently used to assess
equivalence between measures of physiological phenomena,17,18,19 Bland-Altman also has been used to
assess equivalence of paper and pencil tests20 or between a paper and pencil test and a physiological
instrument.21

33. Test-Retest Method. Also called Retest Reliability. Involves applying the same test instrument to
the same test subjects at different points in time.
Reported as a correlation coefficient (r).
Inter-rater Reliability and Equivalence. Inter-rater
reliability refers the degree to which researchers agree
in administering and scoring a test instrument given to
a group of subjects, or to the degree of consistency with
paper and pencil data collection.12 Many potential
threats to inter-rater reliability exist in any test situation.
For instance, 2 researchers may vary considerably in
rating an infants response to pain, even if they have
clear criteria and were trained in the same protocol. A
second threat to inter-rater reliability can occur when
data are copied from one source to another (for example, copying laboratory values from the subjects medical record to a data collection tool). To compute
inter-rater reliability, 2 raters should complete at
least 10 identical instruments. Generally, 80% agreement between raters is the minimum required.12 Four
common methods for determining inter-rater reliability
are (Figure 1, 34-37):

38. Bland-Altman plots. Generated with MedCalc


software (www.medcalc.be, Belgium).22 If the
within-mean differences equal 1.96 SD and are
not clinically important, then the 2 methods are
equivalent and may be used interchangeably.

CONCLUSION
Measurement permeates all aspects of our everyday
lives. Daily activities and decisions are guided by it and
very often its systematic predictability is taken for
granted. When we move into scientific inquiry, however, our perspective changes. In the process of explaining or predicting the phenomena and/or processes of
health care, researchers and clinicians must carefully
scrutinize the truthfulness, precision, and dependability
of the instruments and measurement methods used to
generate the knowledge for evidence-based practice.
There are few hard and fast rules to guide us through
the measurement maze23 and, thus, an understanding of
validity and reliability becomes crucial for anyone who
wants to correctly use the research process to advance
nursing science and practice.

34. Alternative Form Method. Measure of test equivalence. Involves creating at least 2 versions of the
test instrument and applying each version to the
same group of test subjects at the same time, on at
28

O L U M E

5 4

U M B E R

U R S I N G

U T L O O K

Mapping Validity and Reliability

Higgins and Straub

The authors would like to thank our colleagues and the anonymous
reviewers for their comments.

REFERENCES
1. Carmines EG, Zeller RA. Reliability and validity. Newbury
Park, CA: Sage Publications; 1979.
2. Knapp TR. Validity, reliability, and neither. Nurs Research
1985;34:189-92.
3. Artinian BM. Conceptual mapping: development of the
strategy. Western J Nurs Research 1982;4:379-93.
4. Artinian BM. Guiding students in theory development
through conceptual mapping. J Nurs Ed 1985;24:156-8.
5. Beitz JM. Concept mapping. navigating the learning process.
Nurse Educator 1998;23:35-41.
6. Irvine LMC. Can concept mapping be used to promote
meaningful learning in nurse education? J Advanced Nurs
1995;21:1175-9.
7. Cook TD, Campbell DT. Quasi-experimentation. Design and
analysis issues for field settings. Boston, MA: Houghton
Mifflin Co; 1979.
8. Nunnally JC, Bernstein IH. Psychometric theory 3rd ed.
New York, NY: McGraw-Hill Publishing Company; 1994.
9. Pedhazur EJ, Schmelkin LP. Measurement, design and
analysis. An integrated approach. Hillsdale, NJ: Lawrence
Erlbaum Associates; 1991.
10. Goodwin LD. Changing conceptions of measurement validity. J Nurs Ed 1997;36:102-7.
11. Lynn MR. Determination and quantification of content
validity. Nursing Research 1986;35:382-5.
12. Burns N, Grove SK. The practice of nursing research 4th ed.
Philadelphia, PA: WB Saunders Co; 2001.

13. Kirk RE. Experimental design 2nd ed. Belmont, CA: Wadsworth, Inc; 1982.
14. Froman RD. Measuring our words on measurement. Research in Nursing and Health 2000; 23:421-22.
15. Ferketich S. Internal consistency estimates of reliability.
Research Nurs Hlth 1990;13:437-40.
16. Topf M. Three estimates of inter-rater reliability for nominal
data. Nursing Research 1986;35(4):253-55.
17. Bland JM, Altman DG. Statistical methods for assessing
agreement between two methods of clinical measurement.
Lancet 1986;1:307-10.
18. Albert NM, Hail MD, Li J, Young JB. Equivalence of
bioimpedance and thermodilution methods in measuring
cardiac output in hospitalized patients with advanced decompensated chronic heart failure. Am J Critical Care 2004;13:
469-79.
19. Barton SJ, Chase T, Latham, B, Rayens MK. Comparing
two methods to obtain blood specimens from pediatric
central venous catheters. J Pediatric Oncology Nurs 2004;
21:320-6.
20. Pesudovs K, Garamendi E, Elliott DB. The quality of life
impact of refractive correction (QIRC) questionnaire: development and validation. Optometry Vision Sci 2004;81:76977.
21. Trivel D, Calmels P, Leger L, Busso T, Devillard X, Castells
J, et al. Validity and reliability of the Huet questionnaire to
assess maximal oxygen uptake. Canadian J Applied Physiology 2004;29:623-38.
22. MedCalc. www.medcalc.be. (Accessed December 1, 2004).
23. Knapp TR, Brown JK. Ten measurement commandments
that often should be broken. Research Nurs Hlth 1995;18:
465-9.

A N U A R Y

/ F

E B R U A R Y

U R S I N G

U T L O O K

29

You might also like