Professional Documents
Culture Documents
Outcomes
Dorcas Eleanor Beaton • Maarten
Boers • Peter Tugwell
Table 31-1 Core Sets for Six Rheumatologic Conditions and for Longitudinal Observational Studies*
BEAToN
Disease process ✓
Aggregate index ✓ (EULAR DAS) ✓ DAI, R = severity Pending ✓ DAS
Biomarkers ✓ Biochemical O
Joint tenderness ✓ 28 or 68 joints ✓ Peripheral (44 joints) ✓
Assessment of Health Outcomes
Enthesitis Enthesitis‡
Joint swelling ✓
Joint stiffness O ✓ Spinal stiffness
Global ✓ Spinal mobility
Patient ✓ ✓ ✓ ✓
Physician ✓ R ✓
Acute-phase reactants ✓ O R ✓‡ ✓
Damage ✓
Radiography or imaging ✓ >1 yr ✓ Bone mineral density ✓ >1 yr ✓ Spine and hip§ ✓ Structural
Deformity
Surgery
Organ damage R (BL)/ ✓ (†) fractures ✓ Damage index ✓Skin disease
R (BL)/ ✓ (†) Change height
Disadvantages
Toxicity effects ✓ R ✓ ✓
Death ✓
Table 31-1 focuses on the core domains that should be life-years estimations. Utility states can be obtained by di-
measured; the next step is deciding on the instrument that rect or indirect methods. Direct methods, such as standard
is able to provide that well in a reproducible, accurate man- gamble and time tradeoff, involve the respondent work-
ner. In some cases, an instrument choice has been suggested ing through exercises to elicit the value for his or her own
(e.g., the HAQ for disability in rheumatoid arthritis). Other health state against things such as time or more or less fa-
times several options are provided. Strand and cowork- vorable health situations.27 Indirect methods capture the
ers19 reviewed six disease activity indices in systemic lupus state with standardized questions and apply predetermined
erythematosus and found they gave comparable results. In weights.28 Examples include the EQ-5D which is five items
some cases, the domains are shared, but the measurement (three response categories) combined to describe a health
technique varies within or by disease; in rheumatoid arthri- state. Similarly, the Health Utility Index gathers informa-
tis, the DAS28 uses 28 joints,3 and in ankylosing spondy- tion on six or seven dimensions of health (depending on the
litis, 44 joints are counted.17 Some of the more commonly version) on five-item to six-item response scales to define a
encountered instruments in arthritis are briefly reviewed. health state.28 Both these scales then use weights determined
in different populations to assign the value to these health
states—hence the “indirect” weighting. The absolute value
Health Status/Quality of Life
obtained across these different approaches varies.27
General Health Status. Generic health outcomes provide Generic measures of health and utility scores are very
information on an aspect of health across many conditions broad. They often do not perform as well as the more specific
so that theoretically comparisons can be made to compare measures described in the following sections because they are
the burden of low back pain with that of arthritis or diabe- designed to allow comparisons across populations and need to
tes. This comparison depends on the ability of that measure include items that might not be relevant in arthritis. In some
to capture the burden in a disease group well. Generic mea- areas, there are measures of quality of life that are designed
sures have advantages of allowing comparisons across dis- for that disease, such as rheumatoid arthritis or osteoporosis,
eases and covering a broader range of health issues—things which offer a measure of the broad concept of quality of life
that may have been overlooked in a core set (e.g., mental in a disease-specific manner. Such measures would not allow
health issues). Often because of their breadth, however, the comparisons across diseases. At the time of this writing, these
generic measures tend not to delve as well into the depth of were not yet recommended as core instruments.
experience in any one disease. Arthritis-related fatigue is not
detected well in many generic measures because a generic Symptoms. Pain is usually measured using a 10-cm visual
measure asks about being tired or not sleeping well. Because analog scale or a 0-to-10 numeric rating scale of the intensity
of this, a generic measure is usually weaker in its ability to of the symptoms.29 This simple measure has been well tested
detect specific changes and their sensitivity to different lev- and is easily understood by patients. Fatigue is another im-
els of disease activity and should usually be supplemented portant symptom and quite distinct from being “tired.” The
with disease-specific measures (described previously).20 ankylosing spondylitis modified core set contains fatigue,
Two commonly used generic measures are the Sickness and it is a recommended area of further research in rheuma-
Impact Profile (SIP)21 and the SF-36.22 The SIP is a 136- toid arthritis and lupus. Measures are being tested or devel-
item list of illness behaviors that provides a weighted score oped at present. Ankylosing spondylitis is currently using the
for the impact of a disease across 12 categories, such as bodily 10-cm visual analog scale of fatigue from the Bath Ankylos-
pain, work and role functioning, and dressing,21 which lead ing Spondylitis Disease Activity Index (BASDAI).30
to global scores (physical, psychosocial, and overall). The
SIP has been shown to measure illness across a wide vari- Disability Scales. Physical disability in rheumatoid arthritis
ety of health conditions.20 The SF-36 is a 36-item question- and osteoarthritis is often measured using the Health Assess-
naire of which 35 items are used to obtain eight domain ment Questionnaire–Disability Index (HAQ-DI),31 which
scores (including physical functioning, mental health, role covers 20 items looking at different aspects of daily func-
functioning, and pain) scored on a 0-to-100 scale (with tioning. Patients score each item on a 0-to-3 scale, where
100 = better health)22 and two summary scores (mental and 3 represents the greatest disability. Scores are obtained for
physical); the questionnaire is scored with normal of 50 and each domain and summed into a total score expressed on
standard deviation of 10. The SF-36 and briefer SF-12 are the same 0-to-3 scale. Scores are adjusted to a 2/3 if an aid
well supported on the website (www.qualitymetric.com) and is used to complete a task. More details on the HAQ-DI are
through manuals that supply age and disease group distri- widely available in print and on the Internet.
butions of scores.23 Direct comparisons of generic measures There are other scales asking about physical function such
have shown differences in scores and health states attribut- as the Arthritis Impact Measurement Scale (AIMS)32 and the
able to the choice of measure.24-26 Studies or clinical results AIMS233 and measures with even more specific foci, such as the
may not be comparable to each other if they are using differ- Western Ontario McMaster (WOMAC) osteoarthritis index,
ent health status scales. which is commonly used in hip and knee osteoarthritis,34 and
the AUSCAN osteoarthritis index for hand osteoarthritis.35
Utilities—Value of Health State. Utility scales offer an
overall score for the value of a health state, setting death Disease Process (Activity, Severity)
at 0 and full health at 1. The emphasis is not on describing
the state, but on assigning a value, worth, or preference to Core sets often include indices of disease process. Disease
that state.27,28 Utilities are needed for economic appraisals process can be divided into activity (inflammatory activity)
and form the health assessment for cost per quality-adjusted and severity (overall severity of disease). There are several
466 BEAToN | Assessment of Health Outcomes
disease activity indices, the most commonly known being outcome. Similarly, Tugwell’s “effective consumer” captures
the DAS36 and DAS283 in rheumatoid arthritis. Using a the degree to which the patient is effectively managing
subset of the core outcomes (i.e., acute-phase reactants, his or her own health care decisions, interactions with the
joint counts, global ratings), the EULAR group formulated health care team, and disease monitoring.44 This may be a
a weighted index that provides a score of 2 to 10 (DAS) reasonable target for self-help interventions.
or 0 to 9 (DAS28). From this, cutoffs were established to
define high, moderate, and low disease states. The low dis-
ease state (DAS28 <2.6) is considered an indicator of remis- EMERGING DOMAINS
sion of arthritis and is touched on again later. More recently, Work Productivity and Disability
the concept of “stable remission” has been proposed; it is
defined as an initial DAS28 of less than 3.2 and a lack of With the shift toward more aggressive management of ear-
change in DAS over time of 1.2 (2 standard errors).37 Dis- lier disease, more people with arthritis are working, and
ease activity indices track the level of inflammatory activity. outcomes need to shift to track work disability (absentee-
There are others, such as the BASDAI30 or the six available ism and at-work productivity loss).7 Work is hard to mea-
in systemic lupus erythematosus.19 When more than one is sure because it depends on the job and the organization the
available, look for direct comparisons such as Strand’s to see individual works in. Absenteeism can mean many things,
if similar information is provided.19,38 and measures should articulate how this was operational-
ized—full days off work, days on insurance payments, or full
and part days off work. More challenging still is measuring
Damage Indices
the difficulty someone is having at work (presenteeism).
A great deal of work has gone into measures of joint damage There are several measures—16 were found in a more recent
in arthritis. van der Heijde and Landewe17 provide a succinct review45—but there are few direct comparisons of these con-
summary of the Sharp, van der Heijde, and Larsen/Scott tech- ceptually diverse instruments. The most commonly used is
niques. Guidelines should be followed closely, focusing on the the work limitations questionnaire (amount of time expe-
hands and feet. Changes in radiographic progression of joint riencing difficulty).46 Two scales developed in arthritis are
damage often are measured by the smallest detectable differ- promising—Gignac’s Work Activity Limitations Scale
ence, the boundary between measurement error (day-to-day (amount of difficulty experienced)47 and Gillworth’s Work
variability) and change discernible from error.39,40 Instability Scale (risk of future work loss).48 Direct compari-
sons of these measures are under way.
Disadvantages—Toxicity/Adverse Events
Nonpaid Work
Medical and nonmedical management of many rheumatic
conditions carries a risk of toxicity and adverse events,41 Participation in valued nonpaid roles, such as parenting,
many of them unexpected. A comprehensive documenta- volunteer work, or leisure activities, can be an impor-
tion of a range of adverse events is important in outcome tant aspect of the burden of disease.49 Outcome measures
assessments separate from the treatment benefits.42 An reflecting this are needed to capture fully the concept of
OMERACT group is currently working on standardizing participation.
reporting of toxicities in rheumatologic trials.43
Patient-Specific Domains
Death
Patient-specific scales, including the MACTAR or PET in
Arthritis is associated with increased mortality, and arthritis- arthritis,50,51 allow the patient to nominate his or her own
specific mortality should be monitored. Death is not specifically scale content, within a guided framework. Most patients
mentioned in the disease-specific core sets for clinical trials, report three to five items that are particularly salient to them.
but would be important to monitor in observational studies. A surprising number of these scales have been developed.52
Attributing death to arthritis is challenging given its depen- Each taps relevant content for that patient, and because of
dence on documentation at time of occurrence, which may or this they also are responsive to change.53 The challenge is
may not be linked to underlying arthritis in the coding. in the mathematics and how to analyze the numeric score
when the items vary across patients. Analysis that focuses
Dollar Costs on individual level quantification is likely best.
The economic burden of arthritis and the cost of care are core
Satisfaction with Health Outcomes
outcomes for psoratic arthritis and recommended for consid-
eration in observational studies and rheumatoid arthritis, Satisfaction scales are often linked with the goals of a health
osteoporosis, and lupus disease groups. Standards for what care organization and focus on the attributes of the structure
should be included in a cost analysis are under development. and process of care. Instruments developed to look at satis-
faction with a specific health end point (e.g., how satisfied
Disease Self-Management are you with the results of your surgery)54 become a health
outcome. Satisfaction with outcome is complex, and Hudak
Lorig’s work in self-management programs often have and colleagues55 point out the complex balance of experi-
focused on improving self-efficacy, which has been found to ences and ability to “live with” ongoing limitations that
reduce pain and health care use.25 Self-efficacy is a health influence a patient’s response.
PART 4 | BROAD ISSUES IN THE APPROACH TO RHEUMATIC DISEASES 467
Health condition
Sleep (disorder/disease)
The OMERACT patient group has identified sleep qual-
ity as an important domain for outcome assessment.
The concept and measurement of it are expected to be
Body function
addressed at the upcoming OMERACT 9 meeting in and structure
Activities Participation
2008.
pain interferes with daily activities? Questions such as these that matches the concept, population, and purpose. Many
should be addressed before any instruments or core sets are guidelines that offer more detail than can be described
reviewed. The instrument should meet the need, and not here are available, particularly for the acceptable levels
the reverse. of reliability and validity.66-72 This section describes a
decision-making process for fit between a given instru-
ment and the clinician’s need (Fig. 31-2). This process
WHO CONSTITUTES THE TARGET POPULATION
builds on the work of Law72 and the OMERACT filter69
The target population is crucial, but often overlooked. A and highlights key understandings in each area from the
given instrument may work well in severe osteoarthritis of published guidelines.
the hip, but not be sensitive to the early symptoms of the This decision-making process emphasizes three things.
disease. Equally important is to consider if one wants to First, it begins by stating the measurement need (why,
measure for an individual patient or for describing a group of what, and in whom), which reinforces that a candidate
patients as a whole. The former demands much higher lev- measure meet one need, but not the next. Second, it
els of measurement properties (e.g., reliability coefficients emphasizes that a lot of the appraisal can be done with-
>0.90 as opposed to 0.75 to 0.80 being adequate for group out statistics. It is done by appraising the questionnaire
descriptions).66 or instrument itself and knowledge of its administration.
Third, the inability to affirm each stage suggests that
SELECTING THE OUTCOME THAT CAN there is no need to continue. At the later data-based
MEET THE MEASUREMENT NEED stages, the clinician may choose to run a small study to
create the evidence (the “do-it” loops) in patients, rather
The selection of an outcome measure depends entirely than abandoning the instrument that seems like a good
on a clear understanding of measurement need. Often a candidate. Given these three key features, the process
well-used instrument is used, rather than looking for one from left to right is reviewed next.
1. Matches
target concept? Boxes marked with “do it loop” are
those where you can create the
No evidence and continue on
2. Feasable Blue boxes = pre-data evaluation
to use? Yellow boxes = data-based evaluation
No
Benchmarking
Choose scores (states)
another 5. Is it interpretable?
tool No Change
thresholds
Combinations:
change and state
Good fit for your needs!
Figure 31-2 Algorithm showing decision-making process for the fit of a candidate measure with your measurement need. The left-hand side of the
page is done by appraising the instrument and its instructions. The right-hand side requires numeric evidence of the relevant measurement properties.
Many instruments are weeded out as a poor fit in steps 1, 2, and 3. The “do-end” loop denotes stages at which you can pause to create evidence if it is
missing and not have to abandon the instrument.
PART 4 | BROAD ISSUES IN THE APPROACH TO RHEUMATIC DISEASES 469
to the intended application. If the goal is to measure high t statistic (mean change/standard error), and effect size (mean
versus low health, the sample should be divided into known change over standard deviation of baseline)74; each can be
groups with high and low health according to another adapted to quantify the relative change between treatment
accepted opinion, and then this scale is tested against it. and control groups.53,76 Deyo and Centor77 also described
The image of a window can help here in selecting compara- the correlational approach (correlate change and another
tors to use for testing. What else gives a bit (or more) of indicator of change) and the receiver operator curve approach
overlap with the target view? How much correspondence is (various change scores against external “gold standard” that
expected between scores? For good construct validity, this the person has changed) where the area under the curve is
a priori hypothesized relationship should be recreated with a summary statistic.77 The numeric summaries of responsive-
data, whether that be a strong correlation or no correlation ness, such as effect sizes or areas under the curve, should cor-
at all. An instrument or measure is never universally valid respond to the type of change expected (a priori theory). A
and requires ongoing testing to improve understanding of large effect size or area under the curve does not mean an
the scores in different situations. instrument is “responsive.” It should correspond with the
change anticipated in the study—small or large. Comparisons
of the effect sizes are helpful if different instruments are being
Evaluative Purpose
compared in the same study, as done by Buchbinder and col-
In evaluative measures, the intent of the study is to focus on leagues53 or by Verhoeven and coworkers,76 who focused on
the amount of change over time. Many clinical trials are doing responsiveness in early rheumatoid arthritis. Responsiveness
this and comparing results between treatment and control is a highly contextualized property, and the same instrument
groups. Interobserver reliability is important if more than one may not be responsive in another situation (e.g., early versus
measurer is to be involved. The hallmark of a good evaluative late disease, osteoarthritis versus rheumatoid arthritis).73
measure relates, however, to time: First, do the scores remain
the same when the target concept has not changed over time
STEP 5: INTERPRETABILITY OF SCORES
(test-retest reliability)? Second, when the concept changes,
does the score on the instrument/measure change as well? The final step, often deemed the most elusive,78 is the inter-
Test-retest reliability requires two administrations of pretability of the scores.
the measure over a time when no change has occurred.
This may be easier said than done sometimes, but the
Benchmarking States
authors should justify their design and how they ensured
no change had occurred. Similar to interobserver reliabil- What is the meaning of a score of 2/10 on a pain score? Is it
ity, the ICC is the preferred statistic for continuous scores, a good outcome? The meaning of different scores on an out-
and weighted kappa, its equivalent, is preferred for cate- come assessment is used for classifying subjects at the begin-
gorical scores. The cutoffs are the same, and a coefficient ning of a trial and at the end point. To do this, comparisons
can be converted into a “minimal detectable change”75 are made to other known health states—severity indices,
as 1.96 × s(2[1 − r])1/2, where s = standard deviation and ability to work, self-rating as mild.79 Gradually, enough
r = test-retest reliability (ICC).66,75 Ninety-five percent of trends might be seen across different scenarios to gain confi-
subjects who are stable have change scores less than this dence in the meaning of “good” or “mild.”80,81 In rheumatol-
value; a change greater than this is not likely to occur in a ogy, we see the emergence of low disease activity states82,83
stable patient, only in a changing one. It becomes a lower or patient acceptable symptom states84 or remission criteria
boundary of meaningful change—anything below that with the DAS2838 as thresholds below which subjects are
could be day-to-day fluctuations in scores. considered to be in an acceptable state (either tolerable
Responsiveness—the accurate detection of change when symptoms or disease activity where it does not require medi-
it has occurred—is sometimes best thought of as longitudinal cation changes). At this point, these thresholds are being
construct validity. Similar to construct validity, responsive- established, and similar to change thresholds, we may find
ness depends on an a priori theoretic relationship—one in variability in the values38 that need to be sorted out with
which the attribute is changing over time. Often the focus methodologic work and application in clinical practice.
is on the amount of change picked up, rather than the type
or amount of change that had occurred. A large change is
Changes in State
not useful if we were expecting a small one; rather it suggests
noise. The construct embedded in a study of responsiveness The second type of interpretability concerns change scores.
should be described carefully and should be a clear match with
the intended application (measurement need). If the goal is American College of Rheumatology Response Criteria.
to detect change in a clinical trial, it is important to assess the The American College of Rheumatology took the core
instrument’s ability to detect the difference in change between set measures and determined that if one observed an X%
treatment and control groups. If the goal is to detect change change in joint count and in swollen joint count and in at
in a cohort, it might be more useful to examine change in a least three other areas—erythrocyte sedimentation rate or
single group perhaps in a treatment of known efficacy (hip C-reactive protein, physician global, patient global, pain,
replacement) or in subjects who rated themselves as improved or physical disability—one had a clinical response, and the
on an external anchor (global index of change). individual would be classified as a responder. The percent
Responsiveness is summarized with statistics of sig- is usually 20%, but 50% and 70% have been considered.
nal (change) over noise (error), such as the standardized The ACR20 is widely used, catches responses across a
response mean (mean change/standard deviation of change), wide variety of domains, and discriminates well in clinical
PART 4 | BROAD ISSUES IN THE APPROACH TO RHEUMATIC DISEASES 471
trials76; however, it is currently being revalidated owing to reviewing the literature extensively for the measurement
the changing nature of rheumatoid arthritis and its care.85 properties.
between touch screens and traditional paper and pen- 4. Fries JF, Spitz PW, Young DY: The dimensions of health outcomes:
cils are promising, and the acceptability by patients with The Health Assessment Questionnaire, Disability and Pain Scales.
J Rheumatol 9:19203, 1982.
arthritis is good.97-99 New technology means that health 5. Ware JE Jr, Sherbourne CD: The MOS 36-Item Short-Form Health
outcome assessment can become part of the patient- Survey (SF-36), I: Conceptual framework and item selection. Med
clinician experience and facilitate the ability of the clini- Care 30:473-483, 1992.
cian to monitor the patient’s health.97 6. Fries JF: The hierarchy of outcome assessment. J Rheumatol 20:
546-547, 1993.
7. Wolfe F, Lassere M, van der Heijde D, et al: Preliminary core set
Adaptation to an Ongoing Disease of domains and reporting requirements for longitudinal observa-
tional studies in rheumatology. J Rheumatol 26:484-489, 1999.
This chapter has focused on the measurement of health states 8. van der Heijde D, van der Linden S, Bellamy N, et al: Which
and their interpretation over time. Individuals with chronic domains should be included in a core set for endpoints in ankylos-
ing spondylitis? Introduction to the ankylosing spondylitis module of
diseases adapt to ongoing disease with behavioral strategies or OMERACT IV. J Rheumatol 26:945-947, 1999.
cognitive reframing of their situation.100 In some circles, this 9. Gladman DD, Mease PJ, Healy P, et al: Outcome measures in psori-
is adjustment95; in others, it is response shift.101 The challenge atic arthritis (PsA). J Rheumatol 34:1159-1166, 2007.
in health outcomes assessment is to tell when a state is chang- 10. Gladman DD, Mease PJ, Strand V, et al: Consensus on a core set of
domains for psoriatic arthritis. OMERACT 8 PsA Module Report.
ing only because of adaptation and not the intervention. In J Rheumatol 34:1167-1170, 2007.
many situations, we try to induce adaptation, or cognitive 11. Sambrook PN, Cummings SR, Eisman JA, et al: Guidelines of osteo-
reframing, and it can be constructive. It does create a bias porosis trials (workshop report). J Rheumatol 24:1234-1236, 1997.
in measurement,101 however, and a challenge to the health 12. Gladman DD, Strand V, Mease PJ, et-al: OMERACT 7 psoriatic
outcome assessor. Numerous groups are researching how to arthritis workshop: Synopsis. Ann Rheum Dis 64:ii-115-ii-116, 2005.
13. Bellamy N, Kirwan J, Boers M, et al: Recommendations for a core set
incorporate adaptation into health outcome assessments. of outcome measures for future phase III clinical trials in knee, hip,
and hand osteoarthritis: Consensus development at OMERACT III.
J Rheumatol 24:799-802, 1997.
SUMMARY 14. Smolen JS, Strand V, Cardiel M, et al: Randomized clinical trials and
longitudinal observational studies in systemic lupus erythematosus:
There is considerable room for improvement in health out- Consensus on a preliminary core set of outcome domains. J Rheuma-
come assessment in rheumatology, despite the work done to tol 26:504-507, 1999.
date. A battery of instruments have been developed, many of 15. Boers M, Tugwell P, Felson DT, et al: World Health Organization
which exhibit the measurement properties described in this and International League of Associations for Rheumatology core
chapter and meet the challenge of a changing arthritis tar- endpoints for symptom modifying antirheumatic drugs in rheumatoid
arthritis clinical trials. J Rheumatol 21:86-89, 1994.
get (less severe, earlier disease), and several more measures 16. Felson DT, Anderson JJ, Boers M, et al: The American College of
are being considered for membership in core sets to capture a Rheumatology preliminary core set of disease activity measures for
comprehensive view of the burden of arthritis. We are on the rheumatoid arthritis clinical trials. The Committee on Outcome
brink of deciding on the role to be played by IRT and CAT Measures in Rheumatoid Arthritis Clinical Trials. Arthritis Rheum
36:729-740, 1993.
in widespread care settings. Despite progress in assigning a 17. van der Heijde D, Landewe R: Selection of a method for scoring radio-
numeric value to a complex health state, however, we are now graphs for ankylosing spondyolitis clinical trials, by the Assessment
struggling with the back-translation—what does the numeric in Ankylosing Spondylitis working groups (ASAS) and OMERACT.
score mean in the real patient world. It is not always a simple J Rheumatol 32:2048-2049, 2005.
translation from questionnaire score to clinical meaning. 18. Guidelines of osteoporosis trials (workshop report). J Rheumatol
24:1234–1236, 1997.
Health outcome assessment is well advanced in arthritis care, 19. Strand V, Gladman DD, Isenberg D, et al: Outcome measures to be
and we should recognize the years of work and commitment of used in clinical trials in systemic lupus erythematosus. J Rheumatol
many professional and patient/consumer groups. Advances will 26:490-497, 1999.
continue in the use of technology, the breadth and depth of 20. Patrick DL, Deyo RA: Generic and disease-specific measures in
assessing health status and quality of life. Med Care 27(Suppl):
outcomes, and the quality of measurement to keep pace with S217-S232, 1989.
the needs of patients, clinicians, and researchers. 21. Bergner M, Bobbitt RA, Pollard WE, et al: The sickness impact pro-
file: Validation of a health status measure. Med Care 14:57-67, 1976.
Acknowledgments 22. Ware JE Jr: SF-36 health survey update. Spine 25:3130-3139, 2000.
23. Ware JE Jr, Snow KK, Kosinski M, et al: SF-36 Health Survey Man-
Dorcas Beaton is supported by a New Investigators Award through the ual and Interpretation Guide. Boston, The Health Institute, 1993.
Canadian Institutes of Health Research. Peter Tugwell holds a Canada 24. Beaton DE, Bombardier C, Hogg-Johnson SA: Measuring health in
Research Chair. injured workers: A cross-sectional comparison of five generic health
The authors would like to thank Ms. Taucha Inrig, Dr. Claire status instruments in workers with musculoskeletal injuries. Am J Ind
Bombardier, Dr. Fred and Mrs. Janet Krieger, Mr. William Francis, and Med 29:618-631, 1996.
the OMERACT executive for their help with this manuscript, and 25. Beaton DE, Hogg-Johnson S, Bombardier C: Evaluating changes in
Dr. M. Ward, whose chapter in the seventh edition of Kelley’s Textbook of health status: Reliability and responsiveness of five generic health
Rheumatology was a helpful guide. status measures in workers with musculoskeletal disorders. J Clin Epi-
demiol 50:79-93, 1997.
REFERENCES 26. Visser MC, Fletcher AE, Parr G, et al: A comparison of three qual-
ity of life instruments in subjects with angina pectoris: The Sickness
1. Relman AS: Assessment and accountability: The third revolution in Impact Profile, the Nottingham Health Profile, and the Quality of
medical care. N Engl J Med 319:1220-1222, 1988. Well Being Scale. J Clin Epidemiol 47:157-163, 1994.
2. Last JM: A Dictionary of Epidemiology. New York, Oxford University 27. Revicki DA, Kaplan RM: Relationship between psychometric and
Press, 1988. utility-based approaches to the measurement of health-related qual-
3. Prevoo MLL, Van’t Hof MA, et al: Modified disease activity scores ity of life. Qual Life Res 2:477-487, 1993.
that include twenty-eight-joint counts: Development and validation 28. Feeny D: Preference-based measures: utility and quality-adjusted life
in a prospective longitudinal study of patients with rheumatoid years. In Fayers P, Hays R (eds): Assessing Quality of Life in Clinical
arthritis. Arthritis Rheum 38:44-48, 1995. Trials, 2nd ed. New York, Oxford University Press, 2005, pp 405-429.
PART 4 | BROAD ISSUES IN THE APPROACH TO RHEUMATIC DISEASES 473
29. Farrar JT, Portenoy RK, Berlin JA, et al: Defining the clinically 52. O’Boyle CA, Hofer S, Ring L: Individualized quality of life. In Fayers
important difference in pain outcome measures. Pain 88:287-294, P, Hays R (eds): Assessing Quality of Life in Clinical Trials: Methods
2000. and Practice, 2nd ed. New York, Oxford University Press, 2005,
30. Garrett S, Jenkinson T, Kennedy LG, et al: A new approach to defin- pp 225-242.
ing disease status in ankylosing spondylitis: The BATH Ankylosing 53. Buchbinder R, Bombardier C, Yeung M, et al: Which outcome
Spondylitis Disease Activity Index. J Rheumatol 21:2286-2291, 1994. measures should be used in rheumatoid arthritis clinical trials?
31. Fries JF: The hierarchy of quality-of-life assessment, the Health Assess- Arthritis Rheum 38:1568-1580, 1995.
ment Questionnaire (HAQ), and issues mandating development of a 54. Solomon DH, Bates DW, Horsky J, et al: Development and valida-
toxicity index. Controlled Clinical Trials 12:106S-117S, 1991. tion of a patient satisfaction scale for musculoskeletal care. Arthritis
32. Meenan RF, Gertman PM, Mason JH: Measuring health status in Care Res 12:96-100, 1999.
arthritis: The Arthritis Impact Measurement Scales. Arthritis Rheum 55. Hudak PL, McKeever PD, Wright JG: Understanding the meaning of
23:146-152, 1980. satisfaction with treatment outcome. Med Care 42:718-725, 2004.
33. Meenan RF, Mason JH, Anderson JJ, et al: Aims2: The content and 56. Jette AM, Haley SM: Contemporary measurement technique for
properties of a revised and expanded arthritis impact measurement rehabilitation outcome assessment. J Rehabil 37:339-345, 2005.
scales health status questionnaire. Arthritis Rheum 35:1-10, 1992. 57. Jette AM, Keysor JJ: Disability models: Implications for arthritis
34. Bellamy N, Buchanan WW, Goldsmith CH, et al: Validation study exercise and physical activity interventions. Arthritis Care Res 49:
of WOMAC: A health status instrument for measuring clinically- 114-120, 2003.
important patient-relevant outcomes following total hip or knee 58. Wilson IB, Cleary PD: Linking clinical variables with health-related
arthroplasty in osteoarthritis. J Orthop Rheum 1:95-108, 1988. quality of life: A conceptual model of patient outcomes. JAMA
35. Bellamy N, Campbell J, Haraoui B, et al: Clinimetric properties of the 273:59-65, 1995.
AUSCAN osteoarthritis hand index: An evaluation of reliability, valid- 59. Nagi SZ: A study in the evaluation of disability and rehabilitation
ity and responsiveness. Osteoarthritis Cartilage 10:863-869, 2002. potential. Am J Public Health 54:1568-1579, 1964.
36. Van Gestel AM, Prevoo MLL, Van’t Hof MA, et al: Development 60. Verbrugge LM, Jette AM: The disablement process. Soc Sci Med
and validation of the European League Against Rheumatism response 38:1-14, 1994.
criteria for rheumatoid arthritis. Arthritis Rheum 39:34-40, 1996. 61. World Health Organization: International Classification of Function-
37. Vrijhoef HJM, Diederiks JPM, Spreeuwenberg C, et al: Applying low ing, Disability and Health. Geneva, World Health Organization, 2001.
disease activity criteria using the DAS28 to assess stability in patients 62. Arts DGT, Keizer NF, Scheffer G: Defining and improving data qual-
with rheumatoid arthritis. Ann Rheum Disease 62:419-422, 2003. ity in medical registries: A literature review, case study and generic
38. Aletaha D, Ward MM, Machold KP, et al: Remission and active framework. J Am Med Inform Assoc 9:600-611, 2002.
disease in rheumatoid arthritis: Defining criteria for disease activity 63. Stucki G, Boonen A, Tugwell P, et al: The World Health Organi-
states. Arthritis Rheum 52:2625-2636, 2005. sation International Classification of Functioning, Disability and
39. Lassere M, van der Heijde D, Johnson K, et al: Robustness and gen- Health (ICF): A conceptual model and interface for the OMERACT
eralizability of smallest detectable difference in radiological progres- process. J Rheumatol 34(3):600-606, 2007.
sion. J Rheumatol 28:911-913, 2001. 64. Jette AM: Toward a common language for function, disability and
40. Ravaud P, Giraudeau B, Auleley GR, et al: Assessing smallest detect- health. Phys Ther 86:726-734, 2006.
able change over time in continuous structural outcome measures: 65. Jette AM, Keysor JJ: Uses of evidence in disability outcomes and
Application to radiological change in knee osteoarthritis. J Clin Epi- effectiveness research. Milbank Q 80:325-345, 2002.
demiol 52:1225-1230, 1999. 66. McHorneyCA, Tarlov AR: Individual patient monitoring in clini-
41. Lassere M, Johnson K, Van Santen S, et al: Generic patient self- cal practice: Are available health status surveys adequate? Qual
report and investigator report instruments of therapeutic safety and Life Res 4:293, 1995.
tolerability. J Rheumatol 32:2033-2036, 2005. 67. Lohr KN, Aaronson NK, Alonso J, et al: Evaluating quality-of-life
42. U.S. Department of Health and Human Services Food and Drug and health status instruments: Development of scientific review cri-
Administration Center for Drug Evaluation and Research (CDER): teria. Clin Therap 18:979-992, 1996.
Guidance for industry: Patient-reported outcome measures: Use 68. McDowell I, Jenkinson C: Development standards for health mea-
in medical product development to support labeling claims: Draft sures. J Health Serv Res Policy 1:238-246, 1996.
guidance. Available at: http://www.fda.gov/cder/gdlns/prolbl.htm. 69. Boers M, Brooks P, Strand V, et al: The OMERACT Filter for out-
Accessed November 3, 2006. come measures in rheumatology. J Rheumatol 25:198-199, 1998.
43. Woodworth T, Furst DE, Alten R, et al: Standardizing assessment and 70. Kane RA, Kane RL: Assessing the Elderly: A Practical Guide to mea-
reporting of adverse effects in rheumatology clinical trials, II: Rheu- surement. Toronto, Lexington Books, 1981, pp13-17.
matology Common Toxicity Criteria v2.0. J Rheumatol 34:1401- 71. Bergner M: Health status measures: An overview and guide for selec-
1414, 2007. tion. Ann Rev Public Health 8:191-210, 1987.
44. Kristjansson E, Tugwell PS, Wilson AJ, et al: Development of the 72. Law M: Measurement in occupational therapy: Scientific criteria for
effective musculoskeletal consumer scale. J Rheumatol 34:1392- evaluation. Can J Occup Ther 54:133-138, 1987.
1400, 2007. 73. Scientific Advisory Committee of the Medical Outcomes Trust:
45. Escorpizo R, Bombardier C, Boonen A, et al: Worker productivity Assessing health status and quality of life instruments: Attributes
outcome measures in arthritis. J Rheumatol 34:1372-1380, 2007. and review criteria. Qual Life Res 11:193-205, 2002.
46. Lerner D, Amick BC III, Rogers WH, et al: The work limitations 74. Hays RD, Revicki D: Reliability and validity (including responsive-
questionnaire. Med Care 39:72-85, 2001. ness). In Fayers P, Hays R (eds): Assessing Quality of Life in Clinical
47. Gignac MAM, Badley EM, Lacaille D, et al: Managing arthritis and Trials: Methods and Practice, 2nd ed. New York, Oxford University
employment: Making arthritis-related work changes as a means of Press, 2005, pp 25-39.
adaptation. Arthritis Care Res 51:909-916, 2004. 75. Stratford PW, Binkley JM: Applying the results of self-report mea-
48. Gilworth G, Chamberlain AM, Harvey A, et al: Development of sures to individual patients: An example using the Roland-Morris
a work instability scale for rheumatoid arthritis. Arthritis Care Res Questionnaire. J Orthop Sports Physical Ther 29:232-239,
49:349-354, 2003. 1999.
49. Backman C, Kennedy SM, Chalmers A, et al: Participation in paid 76. Verhoeven A, Boers M, van der Linden S: Responsiveness of the core
and unpaid work by adults with rheumatoid arthritis. J Rheumatol set, response criteria, and utilities in early rheumatoid arthritis. Ann
31:47-57, 2004. Rheum Dis 59:966-974, 2000.
50. Tugwell P, Bombardier C, Buchanan WW, et al: The MACTAR 77. DeyoRA, Centor RM: Assessing the responsiveness of functional
Patient Preference Disability Questionnaire—an individualized scales to clinical change: An analogy to diagnostic test perfor-
functional priority approach for assessing improvement in physi- mance. J Chronic Dis 39:897-906, 1986.
cal disability in clinical trials in rheumatoid arthritis. J Rheumatol 78. Kirwan J: Minimum clinically important difference: The crock of
14:446-451, 1987. gold at the end of the rainbow? J Rheumatol 28:439-444, 2001.
51. Buchbinder R, Bombardier C, Yeung M, et al: Which outcome mea- 79. Deyo RA, Carter WB: Strategies for improving and expand-
sures should be used in rheumatoid arthritis clinical trials? Clinical ing the application of health status measures in clinical set-
and quality-of-life measures’ responsiveness to treatment in a ran- tings: A researcher-developer viewpoint. Med Care 30(5 Suppl):
domized controlled trial. Arthritis Rheum 38:1568-1580, 1995. MS176-MS186, 1992.
474 BEAToN | Assessment of Health Outcomes
80. Tubach F, Dougados M, Falissard B, et al: Feeling good rather than 92. Angst F, Aeschlimann A, Stucki G: Smallest detectable and minimal
feeling better matters more to patients. Arthritis Care Res 55: clinically important differences of rehabilitation intervention with
526-530, 2006. their implications for required sample sizes using WOMAC and SF-
81. Beaton DE, Tarasuk V, Katz JN, et al: Are you better? A qualitative 36 quality of life measurement instruments in patients with osteo-
study of the meaning of being better. Arthritis Care Res 7:313-320, arthritis of the lower extremities. Arthritis Care Res 45:384-391,
2001. 2001.
82. Boers M, Anderson JJ, Felson D: Deriving an operational definition 93. Tubach F, Ravaud P, Baron G, et al: Evaluation of clinically relevant
of low disease activity state in rheumatoid arthritis. J Rheumatol changes in patient reported outcomes in knee and hip osteoarthritis:
30:1112-1114, 2003. The minimal clinically important improvement. Ann Rheum Dis
83. Tubach F, Wells GA, Ravaud P, et al: Minimal clinically important 64:29-33, 2005.
difference, low disease activity state and patient acceptable symptom 94. Jacobson NS, Roberts LJ, Berns SB, et al: Methods for defining and
state: Methodological issues. J Rheumatol 32:2025-2029, 2005. determining the clinical significance of treatment effects: Descrip-
84. Tubach F, Ravaud P, Baron G, et al: Evaluation of clinically relevant tion, application, alternatives. J Consult Clin Psychol 67:300-307,
states in patient reported outcomes in knee and hip osteoarthrits: 1999.
The patient acceptable symptom state. Ann Rheum Dis 64:34-37, 95. Norman G: Hi! How are you? Response shift, implicit theories and
2005. differing epistemologies. Qual Life Res 12:249, 2003.
85. Felson DT, Furst DE, Boers M: Rationale and strategies for reeval- 96. National Institutes of Health: Patient Reported Outcome Measure-
uating the ACR20. J Rheumatol 34:1184-1187, 2007. ment Information System (PROMIS) network. 2006. Available at:
86. Beaton DE, Boers M, Wells GA: Many faces of the minimal clini- http://www.nihpromis.org/.
cally important difference (MCID): A literature review and direc- 97. Athale N, Sturley A, Koczen Z, et al: A web-compatible instrument
tions for future research. Curr Opin Rheumatol 14:109-114, for measuring self-reported disease activity in arthritis. J Rheumatol
2002. 31:223-228, 2004.
87. Hays RD, Woolley JM: The concept of clinically meaningful differ- 98. Fransen J, Stucki G, Twisk J, et al: Effectiveness of a measurement
ence in health-related quality of life research. PharmacoEconomics feedback system on outcome in rheumatoid arthritis: A controlled
18:419-423, 2000. clinical trial. Ann Rheum Dis 62:624-629, 2003.
88. Wells GA, Beaton DE, Shea B, et al: Minimal clinically important 99. Bischoff-Ferrari HF, Vandechend M, Bellamy N, et al: Validation
differences: Review of methods. J Rheumatol 28:406-412, 2001. and patient acceptance of a computer touch screen version of the
89. Norman GR, Sloan JA, Wyrwich KW: Interpretation of changes in WOMAC 3.1 osteoarthritis index. Ann Rheum Dis 64:80-84, 2004.
health-related quality of life: The remarkable universality of half a 100. Shaul MP: From early twinges to mastery: The process of adjustment in
standard deviation. Med Care 41:582-592, 2003. living with rheumatoid arthritis. Arthritis Care Res 8:290-297, 1995.
90. Salaffi F, Stancati A, Silvestri CA, et al: Minimal clinically impor- 101. Schwartz C, Sprangers M, Fayers P: Response shift: You know it’s
tant changes in chronic musculoskeletal pain intensity measures on a there but how do you capture it? Challenges for the next phase of
numerical rating scale. Eur J Pain 8:283-291, 2004. research. In Fayers P, Hays R (eds): Assessing Qualify of Life in Clini-
91. Stucki G, Daltroy L, Katz JN, et al: Interpretation of change scores in cal Trials: Methods and Practice, 2nd ed. New York, Oxford Univer-
ordinal clinical scales and health status measures: The whole may not sity Press, 2005, pp 275-290.
be equal to the sum of the parts. J Clin Epidemiol 49:711-717, 1996.