Professional Documents
Culture Documents
urses, like all those involved in health care research and practice, continuously grapple with
how to evaluate the importance of evidence produced from research investigations. Among the multiple issues to be considered, some may seem more
Patricia A. Higgins is an Assistant Professor at Frances Payne Bolton
School of Nursing, Case Western Reserve University, Cleveland, OH.
Andrew J. Straub is a Nurse Practitioner at Evercare, United Health
Care Services, Inc., Cleveland, OH.
Reprint requests: Dr. Patricia A. Higgins, Case Western Reserve University Nursing, 10900 Euclid Avenue, Cleveland, OH 44106-4904.
E-mail: Patricia.Higgins@case.edu
Nurs Outlook 2006;54:23-29.
0029-6554/06/$see front matter
Copyright 2006 Mosby, Inc. All rights reserved.
doi:10.1016/j.outlook.2004.12.004
A N U A R Y
/ F
E B R U A R Y
U R S I N G
U T L O O K
23
VALIDITY
Validity addresses the inference of truth of a set of
statements.8 The set consists of symbols that represent
mathematical formulae and/or words and statements
that comprise a line of reasoning in which some of the
statements support acceptance of others (conclusions).
In science, validity is essential to a research proposals
theoretical framework, design, and methodology, including how well specific tools or instruments measure
what they are intended to measure. As such, it is
considered an ideal state, to be pursued but never
obtained. Knowing that we will never achieve an
absolute outcome, scientists nevertheless engage in
continuous efforts to attain a degree of validity.9
Even a cursory reading of the literature quickly alerts
a reader to the many different, and sometimes confusing, discussions of validity. We have chosen the approach first used by Cook and Campbell8 and adapted
by Pedhazur and Schmelkin.3 In this approach, validity
is considered a unitary concept because its multiple
domains have common characteristics that are not
mutually exclusive.3 But similar to other abstract concepts, in which the whole is better understood through
a discussion of its particulars, separate discussions are
presented for its 2 primary domains, construct validity
and design validity. Additionally, each of these concepts is broken down into even more specific types of
validity; for example, criterion-related validity is examined as a subset of construct validity.
Construct Validity
Construct validity answers the question, Is the
concept accurately defined and does the instrument or
tool actually measure the concept that it is supposed to
measure? Understanding construct validity is challenging, but knowledge of 3 essential steps assists in
grasping its complexity: (1) specification of context, (2)
specifying and testing predicted relationships, and (3)
interpreting results. The first step is determining the
context or situation in which a construct or concept is
used. Researchers work to establish a degree of construct validity for a particular concept that is specific to
a theoretical framework.1 For example, if used within a
physiological framework, the concept of function is
defined differently than when it is used within a social
interaction framework. In experimental designs, context
24
O L U M E
5 4
U M B E R
U R S I N G
U T L O O K
Design Validity
Design validity is concerned with the evaluation of a
researchers design (blueprint) of a study.12 Given that
scientists seek to draw valid conclusions about the
effect of the independent variable on the dependent
variable and make valid generalizations about the
population and settings of interest, statistical as well as
non-statistical factors must be considered in their design deliberations.13 For the clinician, it can be just as
daunting to determine if research results are relevant;
that is, were the results obtained under circumstances
that are somewhat similar to clinical practice? It is
helpful, therefore, to know that knowledge of the
clinical domain of interest must be combined with 3
closely related factors. The first is concerned with the
A N U A R Y
/ F
E B R U A R Y
U R S I N G
U T L O O K
25
9. Maturation. Occurs in pre-test and post-test designs when changes in subjects affect the dependent variable.
10. Testing. Repeated testing alters a subjects performance through familiarity with the tool.
11. Regression toward the mean. In pre-test/post-test
and matched-group designs, the natural tendency
of subjects extreme scores to move toward the
mean, and thus, it cannot be concluded that the
treatment caused the change.
12. Differential selection of participants. Occurs if
there is systematic bias on the part of the researcher in recruiting subjects and/or assigning
them to groups.
13. Differential loss of participants. May occur during
recruitment and/or data collection.
14. Diffusion of treatment. Occurs when subjects in
the control group inadvertently learn information
intended only for the treatment group, or they
receive the treatment intended for the treatment
group.
15. Ambiguity about direction of causal relationships
among variables. Occurs most frequently in crosssectional designs.
16. Compensatory equalization of treatments. May be
ethically required and/or socially desired. Results
in the loss of the control group.
17. Compensatory rivalry of subjects. Competition
between treatment groups may occur if the research protocol is known and subjects learn of
their group membership (control or treatment).
18. Demoralization of control group subjects. Occurs
when the group receiving the less desirable treatment becomes discouraged or resentful.
External validity. External validity is concerned
with generalizing to particular people, settings and
times and across types of people, settings and times.
The former is concerned with whether research goals
for an identified population have been met; the latter,
for assessing how far one can generalize.8 The 4 factors
discussed here may also be referred to as threats, or
statistical interaction effects. One additional factor that
is not identified explicitly as a threat to external validity
is sampling bias, in the form of non-probability or
convenience sampling. It is used in many nursing study
designs and limits the generalizability of the findings in
the same way as the other threats to external validity
(Figure 1, 19-22).
19. Interaction of subject selection and treatment.
Sample bias occurs when a large number of
subjects inadvertently share a salient characteristic
or a large number of potential subjects decline to
participate. Either can affect the experiments
dependent variable.
20. Interaction of setting and treatment, also known as
ecological validity. Occurs when a treatment effect
O L U M E
5 4
U M B E R
U R S I N G
U T L O O K
RELIABILITY
Reliability in the most general sense is the extent to
which an experiment, test, or any measuring procedure
yields the same results on repeated trials.1 According
to classical measurement theory, it is the ratio of
true-score variance to observed-score variance. Similar
to validity, perfect reliability is an ideal state that is
never achieved. Additionally, there are a number of
issues pertaining to the measurement and/or use of the
term. First, for physiological measures, the term reliability is frequently referred to as accuracy and/or
precision. Second, reliability should be used only to
describe inferences from a particular samples data set,
test or instrument. Also, although the concept map
shows that reliability is multidimensional, with each
dimension reflecting a different conceptual perspective,
J
A N U A R Y
/ F
E B R U A R Y
U R S I N G
U T L O O K
27
31. Kuder-Richardson Formulas. Similar to Cronbachs alpha but used with scale items that are
scored dichotomously.
32. Coefficient Theta. Reliability estimator computed
from principal component or factor analysis that
was developed to account for non-parallel or
multi-dimensionality in a set of items.15
The Bland-Altman plot17 is a relatively new statistical methodology that increasingly is being used to
assess the equivalence between 2 instruments measuring the same phenomenon. Bland and Altman17 argue
against the common method of using correlation coefficients to compare one set of repeated measures to
another set of repeated measures on the basis that
correlations can lead to misleading conclusions about
the agreement between 2 variables. The Bland-Altman
method compares the measurements differences to the
mean of the measurements (in this analysis, the mean of
the measurements is assumed to be the true or correct
measurement against which both sets of experimental
measurements are compared). The statistical limits of
agreement are specified in advance of performing the
statistical analysis. Most frequently used to assess
equivalence between measures of physiological phenomena,17,18,19 Bland-Altman also has been used to
assess equivalence of paper and pencil tests20 or between a paper and pencil test and a physiological
instrument.21
33. Test-Retest Method. Also called Retest Reliability. Involves applying the same test instrument to
the same test subjects at different points in time.
Reported as a correlation coefficient (r).
Inter-rater Reliability and Equivalence. Inter-rater
reliability refers the degree to which researchers agree
in administering and scoring a test instrument given to
a group of subjects, or to the degree of consistency with
paper and pencil data collection.12 Many potential
threats to inter-rater reliability exist in any test situation.
For instance, 2 researchers may vary considerably in
rating an infants response to pain, even if they have
clear criteria and were trained in the same protocol. A
second threat to inter-rater reliability can occur when
data are copied from one source to another (for example, copying laboratory values from the subjects medical record to a data collection tool). To compute
inter-rater reliability, 2 raters should complete at
least 10 identical instruments. Generally, 80% agreement between raters is the minimum required.12 Four
common methods for determining inter-rater reliability
are (Figure 1, 34-37):
CONCLUSION
Measurement permeates all aspects of our everyday
lives. Daily activities and decisions are guided by it and
very often its systematic predictability is taken for
granted. When we move into scientific inquiry, however, our perspective changes. In the process of explaining or predicting the phenomena and/or processes of
health care, researchers and clinicians must carefully
scrutinize the truthfulness, precision, and dependability
of the instruments and measurement methods used to
generate the knowledge for evidence-based practice.
There are few hard and fast rules to guide us through
the measurement maze23 and, thus, an understanding of
validity and reliability becomes crucial for anyone who
wants to correctly use the research process to advance
nursing science and practice.
34. Alternative Form Method. Measure of test equivalence. Involves creating at least 2 versions of the
test instrument and applying each version to the
same group of test subjects at the same time, on at
28
O L U M E
5 4
U M B E R
U R S I N G
U T L O O K
The authors would like to thank our colleagues and the anonymous
reviewers for their comments.
REFERENCES
1. Carmines EG, Zeller RA. Reliability and validity. Newbury
Park, CA: Sage Publications; 1979.
2. Knapp TR. Validity, reliability, and neither. Nurs Research
1985;34:189-92.
3. Artinian BM. Conceptual mapping: development of the
strategy. Western J Nurs Research 1982;4:379-93.
4. Artinian BM. Guiding students in theory development
through conceptual mapping. J Nurs Ed 1985;24:156-8.
5. Beitz JM. Concept mapping. navigating the learning process.
Nurse Educator 1998;23:35-41.
6. Irvine LMC. Can concept mapping be used to promote
meaningful learning in nurse education? J Advanced Nurs
1995;21:1175-9.
7. Cook TD, Campbell DT. Quasi-experimentation. Design and
analysis issues for field settings. Boston, MA: Houghton
Mifflin Co; 1979.
8. Nunnally JC, Bernstein IH. Psychometric theory 3rd ed.
New York, NY: McGraw-Hill Publishing Company; 1994.
9. Pedhazur EJ, Schmelkin LP. Measurement, design and
analysis. An integrated approach. Hillsdale, NJ: Lawrence
Erlbaum Associates; 1991.
10. Goodwin LD. Changing conceptions of measurement validity. J Nurs Ed 1997;36:102-7.
11. Lynn MR. Determination and quantification of content
validity. Nursing Research 1986;35:382-5.
12. Burns N, Grove SK. The practice of nursing research 4th ed.
Philadelphia, PA: WB Saunders Co; 2001.
13. Kirk RE. Experimental design 2nd ed. Belmont, CA: Wadsworth, Inc; 1982.
14. Froman RD. Measuring our words on measurement. Research in Nursing and Health 2000; 23:421-22.
15. Ferketich S. Internal consistency estimates of reliability.
Research Nurs Hlth 1990;13:437-40.
16. Topf M. Three estimates of inter-rater reliability for nominal
data. Nursing Research 1986;35(4):253-55.
17. Bland JM, Altman DG. Statistical methods for assessing
agreement between two methods of clinical measurement.
Lancet 1986;1:307-10.
18. Albert NM, Hail MD, Li J, Young JB. Equivalence of
bioimpedance and thermodilution methods in measuring
cardiac output in hospitalized patients with advanced decompensated chronic heart failure. Am J Critical Care 2004;13:
469-79.
19. Barton SJ, Chase T, Latham, B, Rayens MK. Comparing
two methods to obtain blood specimens from pediatric
central venous catheters. J Pediatric Oncology Nurs 2004;
21:320-6.
20. Pesudovs K, Garamendi E, Elliott DB. The quality of life
impact of refractive correction (QIRC) questionnaire: development and validation. Optometry Vision Sci 2004;81:76977.
21. Trivel D, Calmels P, Leger L, Busso T, Devillard X, Castells
J, et al. Validity and reliability of the Huet questionnaire to
assess maximal oxygen uptake. Canadian J Applied Physiology 2004;29:623-38.
22. MedCalc. www.medcalc.be. (Accessed December 1, 2004).
23. Knapp TR, Brown JK. Ten measurement commandments
that often should be broken. Research Nurs Hlth 1995;18:
465-9.
A N U A R Y
/ F
E B R U A R Y
U R S I N G
U T L O O K
29