Utilization Focused Evaluation The New Century Text 3rd Ed

LA+iliâ+ion-
valuation
The New Century Text
Edition
Michael Quinn Patton
SAGE Publications
<§
International Educational and Professional Publisher
Thousand Oaks London New Delhi
Copyright © 1997 by Sage Publications, Inc.
All rights reserved. No part of this book may be reproduced or utilized in any form
or by any means, electronic or mechanical, including photocopying, recording, or by
any information storage and retrieval system, without permission in writing from the
publisher.
For information address:
SAGE Publications, Inc.
i> 2455 Teller Road

Thousand Oaks, California 91320
E-mail: order@sagepub.com
SAGE Publications Ltd.
6 Bonhill Street
London EC2A 4PU
United Kingdom
SAGE Publications India Pvt. Ltd.
M-32 Market
Greater Kailash I
New Delhi 110 048 India
Printed in the United States of America
Library of Congress Cataloging-in-Publication Data
Patton, Michael Quinn.

Utilization-focused evaluation: the new century text / author,
Michael Quinn Patton. — 3rd ed.
p. cm.
Includes bibliographical references and index.
ISBN 0-8039-5265-1 (pbk.: acid-free paper). — ISBN 0-8039-5264-3
(cloth: acid-free paper)
1. Evaluation research (Social action programs)—United States.
I. Title.
H62.5.U5P37 1996
361.6'1'072—dc20 96-25310
06 07 08 13 12 11 10
Acquiring Editor: C. Deborah Laughton

Editorial Assistant: Dale Grenfell
Production Editor: Diana E. Axelsen
Production Assistant: Sherrise Purdum
Typesetter/Designer: Janelle LeMaster
Cover Designer: Ravi Balasuriya
Print Buyer: Anna Chin
donf^n+s
Preface xiii
PART 1. Toward More Useful Evaluations 1
1. Evaluation Use: Both Challenge and Mandate 3

2. What Is Utilization-Focused Evaluation? How Do You Get Started? 19
3. Fostering Intended Use by Intended Users: The Personal Factor 39
4. Intended Uses of Findings 63
5. Intended Process Uses: Impacts of Evaluation Thinking and Experiences 87
PART 2. Focusing Evaluations: Choices, Options, and Decisions 115

6. Being Active-Reactive-Adaptive: Evaluator Roles,
Situational Responsiveness, and Strategic Contingency Thinking 117
7. Beyond the Goals Clarification Game: Focusing on Outcomes 147
8. Focusing an Evaluation: Alternatives to Goals-Based Evaluation 177
9. Implementation Evaluation: What Happened in the Program? 195
10. The Program's Theory of Action: Conceptualizing Causal Linkages 215
PART 3. Appropriate Methods 239
11. Evaluations Worth Using: Utilization-Focused Methods Decisions 241

12. The Paradigms Debate and a Utilitarian Synthesis 265,
13. Deciphering Data and Reporting Results: Analysis, Interpretations,
Judgments, and Recommendations 301
PART 4. Realities and Practicalities of Utilization-Focused Evaluation 339
14. Power, Politics, and Ethics 341
15. Utilization-Focused Evaluation: Process and Premises 371
References 387
Index 415
About the Author 431

Detailed
L a b l e of (SoKvterv+s
Preface
PART 1. Toward More Useful Evaluations 1
1. Evaluation Use: Both Challenge and Mandate 3

Evaluation Use as a Critical Societal Issue 4
High Hopes for Evaluation 6
Historical Context 10
New Directions in Accountability 12
Standards of Excellence for Evaluation 15
2. What Is Utilization-Focused Evaluation? How Do You Get Started? 19

A Comprehensive Approach 20
The First Challenge: Engendering Commitment 22
Charitable Assessment 25
Learning to Value Evaluation 26
Generating Real Questions 29
Creative Beginnings 33
3. Fostering Intended Use by Intended Users: The Personal Factor 39

The First Step in Utilization-Focused Evaluation 41
Evaluation's Premier Lesson 50
Practical Implications of the Personal Factor 50
User-Focused Evaluation in Practice 58
Beyond Just Beginning 60
4. Intended Uses of Findings 63

Identifying Intended Uses From the Beginning 63
Three Uses of Findings 64
Applying Purpose and Use Distinctions 75
Connecting Decisions to Uses 84
5. Intended Process Uses: Impacts of Evaluation Thinking and Experiences 87

Process as Outcome 88
Process Use Defined 90
A Menu: Uses of Evaluation Logic and Processes 90
Using Evaluation to Enhance Shared Understandings 91
Evaluation as an Integral Programmatic Intervention 93
Supporting Engagement, Self-Determination, and Ownership:
Participatory, Collaborative, and Empowerment Evaluation 97
Program and Organization Development: Developmental Evaluation 103
Concerns, Controversies, and Caveats 110
PART 2. Focusing Evaluations: Choices, Options, and Decisions 115
6. Being Active-Reactive-Adaptive: Evaluator Roles, Situational Responsiveness,

and Strategic Contingency Thinking 117
Evaluation Conditions 118
Changing Uses Over Time 119
Variable Evaluator Roles Linked to Variable Evaluation Purposes 121
Situational Evaluation 126
Internal and External Evaluators 138
7. Beyond the Goals Clarification Game: Focusing on Outcomes 147

Evaluation of the Bear Project 147
Whose Goals Will Be Evaluated? 148
Communicating About Goals and Results 153
Focusing on Outcomes and Results 154
Utilization-Focused Outcomes Framework 158
Meaningful and Useful Goals 167
Levels of Goal Specification ^ 169
The Personal Factor Revisited 174
8. Focusing an Evaluation: Alternatives to Goals-Based Evaluation 177
More Than One Way to Manage a Horse 177
Problems With Goals-Based Evaluation 179
Goal-Free Evaluation 181
A Menu Approach to Focusing Evaluations 184
Changing Focus Over Time: Stage Models of Evaluation 187
9. Implementation Evaluation: What Happened in the Program? 195

Checking the Inventory 196
The Importance of Implementation Analysis 197
Focus on Utility: Information for Action and Decisions 199
Ideal Program Plans and Actual Implementation 200
Variations and Options in Implementation Evaluation 205
Connecting Goals and Implementation 211
10. The Program's Theory of Action: Conceptualizing Causal Linkages 215

All the World's a Stage for Theory 215
Mountaintop Inferences 216
Reflections on Causality in Evaluation 216
The Theory Option in Evaluation: Constructing a Means-Ends Hierarchy 217
Three Approaches to Program Theory 219
User-Focused Theory of Action Approach 221
Getting at Assumptions and Causal Connections 225
Developing a Theory of Action as Process Use 229
Theory Informing Practice, Practice Informing Theory 232
Utilization-Focused Evaluation Theory of Action 234
Causal Theorizing in Perspective 237
PART 3. Appropriate Methods 239
11. Evaluations Worth Using: Utilization-Focused Methods Decisions 241

Methods to Support Intended Uses, Chosen by Intended Users 241
The Million Man March 244
Methods and Measurement Options 247
Assuring Methodological Quality and Excellence 248
Credibility and Use 250
Overall Evaluation Validity ^ 251
Believable and Understandable Data 253
Trade-Offs 257
Truth and Utility 259
Designing Evaluations Worth Using: Reflections on the State of the Art 264
12. The Paradigms Debate and a Utilitarian Synthesis 265

Training 265
The Paradigms Debate 267
Dimensions of the Competing Paradigms 272
Whither the Evaluation Methods Paradigms Debate?
The Debate Has Withered 290
Utilization-Focused Synthesis: A Paradigm of Choices 297
13. Deciphering Data and Reporting Results: Analysis, Interpretations,

Judgments, and Recommendations 301
Setting the Stage for Use 302
A Framework for Reviewing Data 307
Arranging Data for Ease of Interpretation: Focusing the Analysis 307
Simplicity in Data Presentations 309
Interpretations and Judgments 315
Making Claims 321
Useful Recommendations 324
Controversy About Recommendations 326
A Futures Perspective on Recommendations 328
Utilization-Focused Reporting 329
Utilization-Focused Reporting Principles 330
Final Reflections 337
PART 4. Realities and Practicalities of Utilization-Focused Evaluation 339
14. Power, Politics, and Ethics 341

Politics and Evaluation: A Case Example 341
The Political Nature of Evaluation 343
The Power of Evaluation 348
Political Maxims for Utilization-Focused Evaluators 350
The Political Foundations of Organizing Stakeholders
Into an Evaluation Task Force 352
Political Rules in Support of Use 356
Fears of Political Co-optation 357
Evaluation Misuse 359
Ethics of Being User-Focused 361
Guarding Against Corruption of an Evaluation 365
Moral Discourse 366
15. Utilization-Focused Evaluation: Process and Premises 371

A User's Perspective 371
The Flow of a Utilization-Focused Evaluation Process 376
The Achilles' Heel of Utilization-Focused Evaluation 380
Fundamental Premises of Utilization-Focused Evaluation 381
A Vision of an Experimenting Society and Experimenting Evaluators 384
References 387
Index 415
About the Author 431

Preface
Sufi stories are tales used to pass on ancient wisdom. One such story tells of a revered teacher,
MuUa Nasrudin, who was asked to return to his home village to share his wisdom with the
people there.
.Mttlla Nasrudin mounted a platform in the village square and asked rhetorically,
"O my people, do you know what I am about to tell you?"
Some local rowdies, deciding to amuse themselves, shouted rhythmically,
'"NO. .. ! NO. ..!NO... ! NO. . . !"
"In that case," said MuUa Nasrudin with dignity, "I shall abstain from trying to
instruct such an ignorant community," and he stepped down from the platform.
The following week, having obtained an assurance from the hooligans that they
would not repeat their harassment, the elders of the village again prevailed upon
Nasrudin to address them. "O my people," he began again, "do you know what I am
about to say to you?"
Some of the people, uncertain as to how to react, for he was gazing at them
fiercely, muttered, "Yes."
"In that case," retorted Nasrudin, "there is no need for me to say more." He then
left the village square.
On the third occasion, after a deputation of elders had again visited him and
implored him to make one further effort, he stood before the people: "O my people!
Do you know what I am about to say?"
Since he seemed to demand a reply, the villagers shouted, "Some of us do, and
some of us do not."
"In that case," said Nasrudin as he withdrew, "Let those who know teach those
who do not."
-Adapted from Shah, ii^4:80-81
Xlll
xiv • UTILIZATION-FOCUSED EVALUATION
This book records the things that I have I wanted the second edition to set the re-
learned about doing program evaluation cord straight and clarify points of confu-
from those who know. The pages that fol- sion. By my own criteria, I only partially
low represent an accumulation of wisdom succeeded, and reading that edition now,
from many sources: from interviews with after having received useful feedback from
40 federal decision makers and evaluators students and teachers of evaluation, I find
who participated in a study of the use of it less clear on some points than I would
federal health evaluations; from conver- have wished. I have attempted to correct
sations with program staff and funders those deficiencies.
about their evaluation experiences; from Now that utilization-focused evaluation
evaluation colleagues; from participants in has survived to voting age (or even drinking
my evaluation workshops and university age), I feel liberated to be more celebratory
classes, who are struggling to conduct use- and less argumentative in tone. While my
ful evaluations; and from 25 years of evalu- colleagues Joe Wholey, Harry Hatry, and
ation practice. Kathryn Newcomer (1994) may have over-
The evaluation profession has devel- stated the case when they observed that "in
oped dramatically since the last edition of recent years the watchword of the evalu-
this book 10 years ago. Updating this edi- ation profession has been utilization- fo-
tion with recent evaluation research and cused evaluation" (p. 5), I can say without
thinking proved a formidable task, and it hubris that the widespread acceptance of
substantially increased the length of the the premises of utilization-focused evalu-
book because so much has happened on so ation has influenced my voice. In this edi-
many fronts. New chapters have been tion, I have strived to achieve the more
added on new forms of uses, alternative mature tone of the elder, which I find I'm
roles for evaluators, and new concerns becoming. My professional development
about ethics. Yet, the central challenge to parallels the maturation of our profession.
professional practice remains—doing eval- As a field of professional practice, we have
uations that are useful and actually used! reached a level where we know what we're
The tone and substance of this new edi- doing and have a track record of important
tion have been influenced by the fact that contributions to show. That knowledge and
utilization-focused evaluation is now more those contributions are the bedrock of this
than 20 years old. The first edition, pub- new edition.
lished in 1978 and based on research done While I have learned from and am in-
in 1975, had the tone of a toddler throwing debted to many more people than I can
a temper tantrum because no one seemed acknowledge, the personal and profes-
to be paying attention. The second edition, sional contributions of a few special col-
published in 1986, was alternatively brash leagues have been especially important to
and shy, assertive and uncertain, like an me in recent years, particularly in the writ-
adolescent coming of age. By that time, the ing of this edition. Marv Alkin, Jean King, /
first edition had attracted both praise and and Hallie Preskill read portions of the
skepticism, support and opposition, and revision and offered instructive feedback.
the premises undergirding the approach Other colleagues whose writings and wis-
had been sufficiently disseminated to be dom have informed this edition include
distorted, misquoted, and miscategorized. Eleanor Chelimsky, Huey Chen, Bob
Preface • xv
Covert, David Fetterman, Mike Hendricks, ment to keep major texts current, but what
Ernie House, Ricardo Millett, Sharon began as an update became, for me, a major
Rallis, Jim Sanders, Michael Scriven, Will rewrite as I worked to capture all the new
Shadish, Midge Smith, Yoland Wadsworth, developments in evaluation over the last
Carol Weiss, and Joe Wholey. Minnesota decade. When I was tempted to go on to
provides a thriving evaluation community other projects, C. Deborah helped rekindle
in which to work and an active local chap- my commitment to this book. Her knowl-
ter of the American Evaluation Association edge about both good writing and eval-
where friends and colleagues share experi- uation made the difference. Expert and
ences; among local evaluators who have thorough copy editing by Jacqueline A.
been especially helpful to me in recent Tasch also contributed by enhancing the
years are John Brandl, Tom Dewar, Jean quality of the final production.
King, Dick Krueger, Steve Mayer, Paul Jeanne Campbell has been editor, critic,
Mattessich, Marsha Mueller, Ruth Anne colleague, and collaborator. Most of all,
Olson, Greg Owen, and Stacey Stockdill. I she has been a source of power through her
also want to thank several colleagues and caring, belief, and support. She has helped
clients currently or formerly in government me keep my priorities straight in the strug-
who have contributed ideas and experi- gle to balance family, writing, teaching, and
ences that have influenced this edition: consulting, and somehow integrating them
Valerie Caracelli, Kay Knapp, Gene Lyle, all in a rich and loving life together with
Meg Hargreaves, Laurie Hestness, Dennis our children. My daily experience of her
Johnson, Mike Linder, Richard Sonnich- provides ongoing evidence that getting
sen, and Jennifer Thurman. I thank the older does mean getting better. I dedicate
Union Institute Graduate School, espe- this book to her.
cially Dean Larry Ryan, for sabbatical sup- One final note of thanks to evaluation
port to complete this revision. Ongoing sage Halcolm (pronounced and inter-
encouragement from Union Institute fac- preted, How come? as in "Why?"). Since
ulty and learners supports both my teach- the first edition, rumors have persisted that
ing and writing. Halcolm doesn't really exist despite stories
That this new edition was written at and quotations from him in my writings.
all owes much to the patient nurturing Such ignominious scuttlebutt notwith-
and unwavering support of Sage editor standing, I can assure the reader that
C. Deborah Laughton. Sage has a commit- Halcolm exists vitally in my mind.
. This book is both practical and theoreti- offering summaries and illustrations, and
cal. It tells readers how to conduct program menus designed to present options as
evaluations and why to conduct them in the evaluators work with users to make selec-
manner prescribed. Each chapter contains tions from the vast smorgasbord of evalu-
both a review of the relevant literature and ation approaches. Finally, the book offers a
actual case examples to illustrate major definite point of view developed from the
points. Over 50 menus and exhibits have observation that much of what has passed
been added to this edition, with exhibits for program evaluation has not been very
\
xvi • UTILIZATION-FOCUSED EVALUATION
useful; that evaluation ought to be useful; ence, and integrating theory and practice,
and, therefore, that something different this book provides both an overall frame-
must be done if evaluation is to be useful. work and concrete advice for how to con-
Based on research and professional experi- duct useful evaluations.
pyvRT^l
Toward More Useful Evaluations
m n the beginning, God created the heaven and the earth.

^-, y And God saw everything that he made. "Behold," God said, "it is very good."
And the evening and the morning were the sixth day.
And on the seventh day God rested from all His work. His archangel came then unto
Him asking, "God, how do you know that what you have created is 'very good'? What are
your criteria? On what data do you base your judgment? Just exactly what results were
you expecting to attain? And aren't you a little close to the situation to make a fair and
unbiased evaluation?"
God thought about these questions all that day and His rest was greatly disturbed. On
the eighth day God said, "Lucifer, go to hell."
Thus was evaluation born in a blaze of glory. . . .
—From Halcolm's The Real Story of Paradise Lost
Evaluation Use:
Both Challenge and Mandate
I he human condition: insidious prejudice, stultifying fear of the unknown, con-

\*^ tagious avoidance, beguiling distortion of reality, awesomely selective percep-
tion, stupefying self-deception, profane rationalization, massive avoidance of truth—all
marvels of evolution's selection of the fittest. Evaluation is our collective effort to outwit
these human propensities—if we choose to use it.
—Halcolm
On a cold November morning in Minnesota, some 15 people in various states of

wakefulness have gathered to discuss evaluation of a county human services program.
Citizen evaluation advisory board representatives are present; the county board and
State representatives have arrived; and members of the internal evaluation staff are busy
with handouts and overheads. We are assembled at this early hour to review the past
year's evaluation efforts.
They review the problems with getting started (fuzzy program goals, uncertain
funding); the data collection problems (lack of staff, little program cooperation,
inconsistent state and county data processing systems); the management problems
(unclear decision-making hierarchies, political undercurrents, trying to do too much);
and the findings despite it all ("tentative to be sure," acknowledges the internal
evaluator, "but more than we knew a year ago").
Advisory board members are clearly disappointed: "The data just aren't solid
enough." A county commissioner explains why board decisions have been contrary to
evaluation recommendations: "We didn't really get the information we needed when
3
4 • TOWARD MORE USEFUL EVALUATIONS
we wanted it, and it wasn't what we wanted when we got it." The room is filled with
disappointment, frustration, defensiveness, cynicism, and more than a little anger. There
are charges, countercharges, budget threats, moments of planning, and longer moments
of explaining away problems. The chairperson ends the meeting in exasperation,
lamenting: "What do we have to do to get results we can actually use?"
This book is an outgrowth of, and answer to, that question.
Evaluation Use as a Critical Societal Issue
If the scene I have described were grams? And how can evaluations be con-
unique, it would merely represent a frus- ducted in ways that lead to use? How do
trating professional problem for the people we avoid producing reports that gather
involved. But if that scene is repeated over dust on bookshelves, unread and unused?
and over on many mornings, with many Those are the questions this book ad-
advisory boards, then the question of evalu- dresses, not just in general, but within a
ation use would become what sociologist particular framework: utilization-focused
C. Wright Mills (1959) called a critical evaluation.
public issue: The issue of use has emerged at the
interface between science and action, be-
Issues have to do with matters that transcend tween knowing and doing. It raises funda-
these local environments of the individual mental questions about human rationality,
and the range of his inner life. They have to decision making, and knowledge applied to
do with the organization of many such mi- creation of a better world. And the issue is
lieux into the institutions of an historical as fresh as the morning news. To wit, a re-
society as a whole.... An issue, in fact, often cent newspaper headline: "Agency Evalu-
involves a crisis in institutional arrange- ation Reports Disregarded by Legislators
ments, (pp. 8-9) Who Had Requested Them" (Dawson 1995;
see Exhibit 1.1). Let's look, then, at how
In my judgment, the challenge of using the crisis in utilization has emerged. Fol-
evaluation in appropriate and meaningful lowing that, we'll outline how utilization-
ways represents just such a crisis in insti- focused evaluation addresses this crisis.
tutional arrangements. How evaluations
are used affects the spending of billions of
dollars to fight problems of poverty, dis- A Larger Perspective:
ease, ignorance, joblessness, mental an- Using Information in
guish, crime, hunger, and inequality. How the Information Age
are programs that combat these societal
ills to be judged? How does one distin- The challenge of evaluation use epito-
guish effective from ineffective pro- mizes the more general challenge of knowl-
Evaluation Use • 5
EXHIBIT 1.1
Newspaper Column on Evaluation Use
Agency Evaluation Reports Disregarded by Legislators Who Had Requested Them
Minnesota lawmakers who mandated that state agencies spend a lot of employee hours and money
developing performance evaluation reports pretty much ignored t h e m . . . . The official word from the
state legislative auditor's evaluation of the performance evaluation process: Legislators who asked
for the reports did not pay much attention to them. They were often full of boring and insignificant
details
Thousands of employee hours and one million taxpayer dollars went into writing the 21 major state
agency performance evaluation reports. The auditor reports the sad results:
• Only three of 21 state commissioners thought that the performance reports helped the governor
make budget choices regarding their agencies.
• Only seven of 21 agencies were satisfied with the attention given the reports in the House
committees reviewing their programs and budgets. And only one agency was satisfied with the
attention it received in the Senate.
Agency heads also complained to legislative committees this year that the 1993 law mandating the
reports was particularly painful because departments had to prepare new two-year budget requests and
program justifications at the same time. That "dual" responsibility resulted in bureaucratic paperwork
factories running overtime.
"Our experience is that few, if any, legislators have actually read the valuable information contained
in our report...," one agency head told auditors.
"The benefits of performance reporting will not materialize if one of the principal audiences is un-
interested," said another.
"If the Legislature is not serious about making the report 'the key document' in the budget decision
process, it serves little value outside the agency," said a third department head.
Mandating the reports and ignoring them looks like another misguided venture by the 201 -member
Minnesota Legislature. It is the fifth-largest Legislature in the nation and during much of the early part
of this year's five-month session had little to do. With time on their hands, lawmakers could have
devoted more time to evaluation reports. But if the reports were dull and of little value in evaluating
successes of programs, can they be blamed for not reading them?
Gary Dawson, "State Journal" column

Saint Paul Pioneer Press, August 7,1995, p. 4B
SOURCE: Reprinted with permission of Saint Paul Pioneer Press.
edge use in our times. Our age—the Age of communicate information. Our problem is
Information and Communications—has keeping up with, sorting out, absorbing,
developed the capacity to generate, store, and using information. Our technological
retrieve, transmit, and instantaneously capacity for gathering and computerizing
information now far exceeds our human difference was: the graduates felt much
ability to process and make sense out of it more guilty about how they wasted time.
all. We're constantly faced with decid- Research on adolescent pregnancy illus-
ing what's worth knowing versus what to trates another dimension of the knowledge
ignore. use problem. Adolescent health specialist
Getting people to use what is known has Michael Resnik (1984) interviewed teenag-
become a critical concern across the differ- ers who became pregnant. He found very
ent knowledge sectors of society. A major few cases in which the problem was a lack
specialty in medicine (compliance re- of information about contraception, about
search) is dedicated to understanding why pregnancy, or about how to avoid pregnan-
so many people don't follow their doctor's cies. The problem was not applying—just
orders. Common problems of information not using—what they knew. Resnick found
use underlie trying to get people to use seat "an incredible gap between the knowledge
belts, quit smoking, begin exercising, eat and the application of that knowledge. In
properly, and pay attention to evaluation so many instances, it's heartbreaking—they
findings. In the fields of nutrition, energy have the knowledge, the awareness, and
conservation, education, criminal justice, the understanding, but somehow it doesn't
financial investment, human services, cor- apply to them" (p. 15).
porate management, international devel- These examples of the challenges of put-
ting knowledge to use are meant to set a
opment—the list could go on and on—a
general context for the specific concern of
central problem, often the central problem,
this book: narrowing the gap between gen-
is getting people to apply what is already
erating evaluation findings and actually us-
known.
ing those findings for program decision
In agriculture, a major activity of exten- making and improvement. Although the
sion services is trying to get farmers to problem of information use remains central
adopt new scientific methods. Experienced to our age, we are not without knowledge
agricultural extension agents like to tell the about what to do. We've learned a few
story of a young agent telling a farmer things about overcoming our human resis-
about the latest food production tech- tance to new knowledge and change, and
niques. As he begins to offer advice, the over the last two decades of professional
farmer interrupts him and says, "No sense evaluation practice, we've learned a great
in telling me all those new ideas, young deal about how to increase evaluation use.
man. I'm not doing half of what I know I Before presenting what we've learned, let's
should be doing now." look more closely at the scope of the chal-
I remember coming across a follow-up lenge of using evaluation processes and
study of participants in time-management findings.
training. Few were applying the time-
management techniques they had learned.
When graduates of time-management High Hopes for Evaluation
training were compared with a sample of
nonparticipants, the differences were not Evaluation and Rationality
in how people in each group managed their
time. The time-management graduates had Edward Suchman (1967) began his
quickly fallen back into old habits. The seminal text on evaluation research with
Hans Zetterberg's observation that "one of tify all such studies, as early as 1976, the
the most appealing ideas of our century is Congressional Sourcebook on Federal Pro-
the notion that science can be put to work gram Evaluations contained 1,700 cita-
to provide solutions to social problems" tions of program evaluation reports issued
(p. 1). Social and behavioral science em- by 18 U.S. Executive Branch agencies and
bodied the hope of finally applying human the General Accounting Office (GAO)
rationality to the improvement of society. during fiscal years 1973 through 1975
In 1961, Harvard-educated President John (Office of Program Analysis, GAO
F. Kennedy welcomed scientists to the 1976:1). The numbers have grown sub-
White House as never before. Scientific stantially since then. In 1977, federal
perspectives were taken into account in the agencies spent $64 million on program
writing of new social legislation. Econo- evaluation and more than $1.1 billion on
mists, historians, psychologists, political social research and development (Abram-
scientists, and sociologists were all wel- son 1978). The third edition of the Com-
comed into the public arena to share in the pendium of Health and Human Services
reshaping of modern postindustrial society. Evaluation Studies (U.S. Department of
They dreamed of and worked for a new Health and Human Services 1983) con-
order of rationality in government—a ra- tained 1,435 entries. The fourth volume of
tionality undergirded by social scientists the U.S. Comptroller General's directory
who, if not philosopher-kings themselves, of Federal Evaluations (GAO 1981) iden-
were at least ministers to philosopher- tified 1,429 evaluative studies from vari-
kings. Carol Weiss (1977) has captured the ous U.S. federal agencies completed be-
optimism of that period. tween September 1, 1979, and September
30, 1980. While the large number of and
There was much hoopla about the rationality substantial funding for evaluations sug-
that social science would bring to the untidy gested great prosperity and acceptance,
world of government. It would provide hard under the surface and behind the scenes,
data for planning . . . and give cause-and- a crisis was building—a utilization crisis.
effect theories for policy making, so that
statesmen would know which variables to
alter in order to effect the desired outcomes. Reality Check:
It would bring to the assessment of alterna- Evaluations Largely Unused
tive policies a knowledge of relative costs and
benefits so that decision makers could select By the end of the 1960s, it was becoming
the options with the highest payoff. And clear that evaluations of Great Society so-
once policies were in operation, it would cial programs were largely ignored or poli-
provide objective evaluation of their effec- ticized. The Utopian hopes for a scientific
tiveness so that necessary modifications and rational society had somehow failed to
could be made to improve performance, be realized. The landing of the first human
(p. 4) on the moon came and went, but poverty
persisted despite the 1960s "War" on it—
One manifestation of the scope, perva- and research was still not being used as the
siveness, and penetration of these hopes is basis for government decision making.
the number of evaluation studies actually While all types of applied social science
conducted. While it is impossible to iden- suffered from underuse (Weiss 1977),
nonuse seemed to be particularly charac- (1968) for the National Science Founda-
teristic of evaluation studies. Ernest House tion; and the Social Science Research
(1972) put it this way: "Producing data is Council's (1969) prospective on the Behav-
one thing! Getting it used is quite another" ioral and Social Sciences.
(p. 412). Williams and Evans (1969) wrote British economist L. J. Sharpe (1977)
that "in the final analysis, the test of the reviewed the European literature and
effectiveness of outcome data is its impact commission reports on use of social scien-
on implemented policy. By this standard, tific knowledge and reached a decidedly
there is a dearth of successful evaluation gloomy conclusion:
studies" (p. 453). Wholey et al. (1970)
concluded that "the recent literature is We are brought face to face with the fact that
unanimous in announcing the general fail- it has proved very difficult to uncover many
ure of evaluation to affect decision making instances where social science research has
in a significant way" (p. 46). They went on had a clear and direct effect on policy even
to note that their own study "found the when it has been specifically commissioned
same absence of successful evaluations by government, (p. 45)
noted by other authors" (p. 48). Cohen
and Garet (1975) found "little evidence to Ronald Havelock (1980) of the Knowledge
indicate that government planning offices Transfer Institute generalized that "there is
have succeeded in linking social research a gap between the world of research and
and decision making" (p. 19). Seymour the world of routine organizational prac-
Deitchman (1976), in his The Best-Laid tice, regardless of the field" (p. 13). Rippey
Schemes: A Tale of Social Research and (1973) commented,
Bureaucracy, concluded that "the impact of
the research on the most important affairs At the moment there seems to be no indica-
of state was, with few exceptions, nil" tion that evaluation, although the law of the
(p. 390). Weidman et al. (1973) concluded land, contributes anything to educational
that "on those rare occasions when evalu- practice, other than headaches for the re-
ations studies have been used . . . the little searcher, threats for the innovators, and
use that has occurred [has been] fortuitous depressing articles for journals devoted to
rather than planned" (p. 15). In 1972, evaluation, (p. 9)
Carol Weiss viewed underutilization as one
of the foremost problems in evaluation re- It can hardly come as a surprise, then,
search: "A review of evaluation experience that support for evaluation began to de-
suggests that evaluation results have not cline. During the Reagan Administration,
exerted significant influence on program the GAO (1987) found that federal evalu-
decisions" (pp. 10-11). This conclusion ation received fewer resources and that
was echoed by four prominent commis- "findings from both large and small stud-
sions and study committees: the U.S. ies have become less easily available for
House Committee on Government Opera- use by the Congress and the public" (p. 4).
tions, Research and Technical Programs In both 1988 and 1992, the GAO pre-
Subcommittee (1967); the Young Commit- pared status reports on program evalu-
tee report published by the National Acad- ation to inform changing executive
emy of Sciences (1968); the Report of the branch administrations at the federal
Special Commission on the Social Sciences level.
We found a 22-percent decline in the number the available information did not reach the
of professional staff in agency program [appropriate Senate] Committee, or reached
evaluation units between 1980 and 1984. A it in a form that was too highly aggregated
follow-up study of 15 units that had been to be useful or that was difficult to digest.
active in 1980 showed an additional 12% (GAO 1995:39)
decline in the number of professional staff
between 1984 and 1988. Funds for program Many factors affect evaluation use in
evaluation also dropped substantially be- Congress (Boyer and Langbein 1991),
tween 1980 and 1984 (down by 37% in but politics is the overriding factor
constant 1980 dollars).... Discussions with (Chelimsky 1995a, 1992, 1987a, 1987b).
the Office of Management and Budget offer Evaluation use throughout the U.S. fed-
no indication that the executive branch in- eral government appears to have contin-
vestment in program evaluation showed any ued its spiral of decline through the 1990s
meaningful overall increase from 1988 to (Wargo 1995; Popham 1995; Chelimsky
1992. (GAO 1992a:7) 1992). In many federal agencies, the em-
phasis shifted from program evaluation to
The GAO (1992a) went on to conclude inspection, auditing, and investigations
that its 1988 recommendations to enhance (N. L. Smith 1992; Hendricks et al. 1990).
the federal government's evaluation func- However, anecdotal reports from state
tion had gone unheeded: "The effort to and local governments, philanthropic
rebuild the government's evaluation capac- foundations, and the independent sector
ity that we called for in our 1988 transi- suggest a surge of interest in evaluation. I
tion series report has not been carried believe that whether this initial interest
out" (p. 7). Here, ironically, we have an and early embrace turn into long-term
evaluation report on evaluation going un- support and a sustainable relationship will
used. depend on the extent to which evaluations
In 1995, the GAO provided another prove useful.
report to the U.S. Senate on Program Evalu- Nor is the challenge only one of increas-
ation, subtitled Improving the Flow of In- ing use. "An emerging issue is that of misuse
formation to the Congress. GAO analysts of findings. The use-nonuse continuum is a
conducted follow-up case studies of three measure of degree or magnitude; misuse is
major federal program evaluations: the a measure of the manner of use" (Alkin and
Comprehensive Child Development Pro- House 1992:466). Marv Alkin (1991,
gram, the Community Health Centers pro- 1990; Alkin and Coyle 1988), an early
gram, and the Chapter 1 Elementary and theorist of user-oriented evaluation, has
Secondary Education Act aimed at provid- long emphasized that evaluators must at-
ing compensatory education services to tend to appropriate use, not just amount of
low-income students. The analysts con- use. Ernest House (1990a), one of the most
cluded that astute observers of how the evaluation pro-
fession has developed, observed in this re-
lack of information does not appear to be the gard: "Results from poorly conceived stud-
main problem. Rather, the problem seems to ies have frequently been given wide
be that available information is not organ- publicity, and findings from good studies
ized and communicated effectively. Much of have been improperly used" (p. 26). The
field faces a dual challenge then: support- create a demand for systematic empirical
ing and enhancing appropriate uses while evaluation of the effectiveness of govern-
also working to eliminate improper uses. ment programs (Walters 1996; Wye and
We are called back, then, to the early Sonnichsen 1992), although that was often
morning scene that opened this chapter: threatening to programs since many had
decision makers lamenting the disappoint- come to associate evaluation with an attack
ing results of an evaluation, complaining and to think of evaluators as a program
that the findings did not tell them what they termination squad.
needed to know. For their part, evaluators Education has long been a primary tar-
complain about many things, as well, "but get for evaluation. Beginning with Joseph
their most common complaint is that their Rue's comparative study of spelling per-
findings are ignored" (Weiss 1972d:319). formance by 33,000 students in 1897, the
The question from those who believe in the field of educational evaluation has been dom-
importance and potential utility of evalu- inated by achievement testing. During the
ation remains: What has to be done to get Cold War, after the Soviet Union launched
results that are appropriately and meaning- Sputnik in 1957, calls for better educa-
fully used? This question has taken center tional assessments accompanied a critique
stage as program evaluation has emerged as born of fear that the education gap was
a distinct field of professional practice. even larger than the "missile gap." Demand
for better evaluations also accompanied
the growing realization that, years after
Historical Context the 1954 Supreme Court Brown decision
requiring racial integration of schools,
The Emergence of Program "separate and unequal" was still the norm
Evaluation as a Field of rather than the exception. Passage of the
Professional Practice U.S. Elementary and Secondary Education
Act in 1965 contributed greatly to more
Like many poor people, evaluation in comprehensive approaches to evaluation.
the United States has grown up in the The massive influx of federal money aimed
"projects"—federal projects spawned by at desegregation, innovation, compensa-
the Great Society legislation of the 1960s. tory education, greater equality of oppor-
When the federal government of the tunity, teacher training, and higher student
United States began to take a major role in achievement was accompanied by calls for
alleviating poverty, hunger, and joblessness evaluation data to assess the effects on
during the Depression of the 1930s, the the nation's children. To what extent did
closest thing to evaluation was the employ- these changes really make an educational
ment of a few jobless academics to write difference?
program histories. It was not until the mas- But education was only one arena in the
sive federal expenditures on an awesome War on Poverty of the 1960s. Great Society
assortment of programs during the 1960s programs from the Office of Economic
and 1970s that accountability began to Opportunity were aimed at nothing less
mean more than assessing staff sincerity or than the elimination of poverty. The cre-
political head counts of opponents and pro- ation of large-scale federal health pro-
ponents. A number of events converged to grams, including community mental health
PROGRAM TERMINATION
SQUAD
4 s DIRECTOR
OF THIS PROGRAM,
WE'D LIKE TO ASK YOUj
WWLD YOU SAY THIS
EVALUATION HAD ANY
IMPACT ON YOUR PROGRAM?]
centers, was coupled with a mandate for sons from this period of large-scale social
evaluation, often at a level of 1% to 3 % of experimentation and government inter-
program budgets. Other major programs vention: First, there is not enough money
were created in housing, employment, ser- to do all the things that need doing; and,
vices integration, community planning, ur- second, even if there were enough money,
ban renewal, welfare, family programs (Weiss it takes more than money to solve complex
and Jacobs 1988), and so on—the whole of human and social problems. As not every-
which came to be referred to as "butter" (in thing can be done, there must be a basis for
opposition to the "guns") expenditures. In deciding which things are worth doing.
the 1970s, these Great Society programs Enter evaluation.1
collided head on with the Vietnam War, While pragmatists turned to evaluation
rising inflation, increasing taxes, and the as a commonsensical way to figure out
fall from glory of Keynesian economics. All what works and is worth funding, visionar-
in all, it was what sociologists and social ies were conceptualizing evaluation as the
historians, with a penchant for under- centerpiece of a new kind of society: the
statement, would characterize as "as a pe- experimenting society. Donald T. Campbell
riod of rapid social and economic change." ([1971] 1991) gave voice to this vision in
Program evaluation as a distinct field of his 1971 address to the American Psycho-
professional practice was born of two les- logical Association.
The experimenting society will be one which establishment of the Canadian Evaluation
will vigorously try out proposed solutions to Society and the Australasian Evaluation So-
recurrent problems, which will make hard- ciety. In 1995, the first International Evalu-
headed and multidimensional evaluations of ation Conference included participation
the outcomes, and which will move on to from new professional evaluation associa-
other alternatives when evaluation shows one tions representing Central America,
reform to have been ineffective or harmful. Europe, and the United Kingdom.
We do not have such a society today, (p. 223)
Early visions for evaluation, then, fo- N e w Directions

cused on evaluation's expected role in in Accountability
guiding funding decisions and differenti-
ating the wheat from the chaff in federal A predominant theme of the 1995 In-
programs. But as evaluations were imple- ternational Evaluation Conference was
mented, a new role emerged: helping im- worldwide interest in reducing government
prove programs as they were imple- programs and making remaining programs
mented. The Great Society programs more effective and accountable. This
foundered on a host of problems: manage- theme first took center stage in the United
ment weaknesses, cultural issues, and fail- States with the election of Ronald Reagan
ure to take into account the enormously as President in 1980. He led a backlash
complex systems that contributed to pov- against government programming, espe-
erty. Wanting to help is not the same as cially welfare expenditures. Decline in sup-
knowing how to help; likewise, having the port for government programs was fueled
money to help is not the same as knowing by the widespread belief that such efforts
how to spend money in a helpful way. were ineffective and wasteful. While the
Many War on Poverty programs turned Great Society and War on Poverty pro-
out to be patronizing, controlling, de- grams of the 1960s had been founded on
pendency generating, insulting, inade- good intentions and high expectations,
quate, misguided, overpromised, waste- they came to be perceived as failures. The
ful, and mismanaged. Evaluators were "needs assessments" that had provided the
called on not only to offer final judgments rationales for those original programs had
about the overall effectiveness of profound that the poor, the sick, the homeless,
grams, but to gather process data and the uneducated—the needy of all kinds—
provide feedback to help solve program- needed services. So services and programs
ming problems along the way (Sonnichen were created. Thirty years down the road
1989; Wholey and Newcomer 1989). from those original efforts, and billions of
By the mid-1970s, interest in evalua- dollars later, most social indicators re-
tion had grown to the point where two vealed little improvement. Poverty statis-
professional organizations were estab- tics—including the number of multigenera-
lished: the academically oriented Evalua- tional welfare recipients and rates of
homelessness, hard-core unemployment,
tion Research Society and the practitioner-
and underemployment—as well as urban
oriented Evaluation Network. In 1984,
degradation and increasing crime com-
they merged to form the American Evalu-
bined to raise questions about the effective-
ation Association. By that time, interest in
ness of services. Reports on effective pro-
evaluation had become international, with
grams (e.g., Guttmann and Sussman 1995; an aside, and in all fairness, this perception
Kennedy School of Government 1995; is not unique to the late twentieth century.
Schorr 1988) received relatively little me- In the nineteenth century, Spencer traced
dia attention compared to the relentless 32 acts of the British Parliament and dis-
press about waste and ineffectiveness covered that 29 produced effects contrary
(Wortman 1995). In the 1990s, growing to those intended (Edison 1983:1,5). Given
concerns about federal budget deficits and today's public cynicism, 3 effective pro-
runaway entitlement costs intensified the grams out of 32 might be considered a
debate about the effectiveness of govern- pretty good record.
ment programs. Both conservatives and lib- More damning still, the perception has
erals were faced with public demands to grown in modern times that no relationship
know what had been achieved by all the exists between the amount of money spent
programs created and all the money spent. on a problem and the results accomplished,
The call for greater accountability became an observation made with a sense of despair
a watershed at every level—national, state, by economist John Brandl in his keynote
and local; public sector, nonprofit agencies, address to the American Evaluation Asso-
and the private sector (Bonsignore 1996; ciation in New Orleans in 1988. Brandl, a
HFRP 1996a, 1996b; Horsch 1996; Briz- professor in the Hubert H. Humphrey In-
ius and Campbell 1991). stitute of Public Affairs at the University of
Clear answers were not forthcoming. Minnesota (formerly its Director), was pres-
Few programs could provide data on re- ent at the creation of many human services
sults achieved and outcomes attained. In- programs during his days at the old Depart-
ternal accountability had come to center on ment of Health, Education, and Welfare
how funds were spent (inputs monitoring), (HEW). He created the interdisciplinary
eligibility requirements (who gets services, Evaluation Methodology training program
i.e., client characteristics), how many peo- at the University of Minnesota. Brandl later
ple get services, what activities they par- moved from being a policy analyst to being
ticipate in, and how many complete the a policy formulator as a Minnesota state
program. These indicators of inputs, cli- legislator. His opinions carry the weight of
ent characteristics, activities, and outputs both study and experience. In his 1988
(program completion) measured whether keynote address to professional evaluators,
providers were following government rules he opined that no demonstrable relation-
and regulations rather than whether de- ship exists between program funding levels
sired results were being achieved. Control and impact, that is, between inputs and
had come to be exercised through audits, outputs; more money spent does not mean
licensing, and service contracts rather than higher quality or greater results.
through measured outcomes. The conse- In a 1994 article, Brandl updated his
quence was to make providers and prac- analysis. While his immediate focus was on
titioners compliance-oriented rather than Minnesota state government, his com-
results-focused. Programs were rewarded ments characterize general concerns about
for doing the paperwork well rather than the effectiveness of government programs
for making a difference in clients' lives. in the 1990s:
Public skepticism turned to deep-seated
cynicism. Polling data showed a wide- The great government bureaucracies of Min-
spread perception that "nothing works." As nesota and the rest of America today are
EXHIBIT 1.2
Premises of Reinventing Government
What gets measured gets done.

If you don't measure results, you can't tell success from failure.
If you can't see success, you can't reward it.
If you can't reward success, you're probably rewarding failure.
If you can't see success, you can't learn from it.
If you can't recognize failure, you can't correct it.
If you can demonstrate results, you can win public support.
SOURCE: From Osborne and Gaebler (1992: chapter 5, "Results-Oriented Government").
failing for the same reason that the formerly countability — a n d greater use of evalu-
Communist governments in Europe fell a ation processes and results—the center-
few years ago and Cuba is teetering today. piece of reform. This is illustrated in Ex-
There is no systematic accountability. People hibit 1.2 by the premises for results-
are not regularly inspired to do good work, oriented government p r o m u l g a t e d by
rewarded for outstanding performance, or Osborne and Gaebler (1992) in their in-
penalized for not accomplishing their tasks. fluential and best-selling book, Reinvent-
In bureaus, people are expected to do ing Government: How the Entrepre-
well because the rules tell them to do so. neurial Spirit is Transforming the Public
Indeed, often in bureaus here and abroad, Sector.
able, idealistic workers become disillusioned The future of evaluation is tied to the
and burned out by a system that is not ori- future effectiveness of programs. N e w calls
ented to produce excellent results. No infu- for results-oriented, accountable program-
sion of management was ever going to make ming challenge evaluators to increase the
operations of the Lenin shipyard in Gdansk use and effectiveness of evaluations. Indict-
effective. ments of program effectiveness are, under-
Maybe—I would say surely—until sys- neath, also indictments of evaluation. T h e
tematic accountability is built into govern- original promise of evaluation was that it
ment, no management improvements will do would point the way to effective program-
the job. (p. 13A). ming. Later, that promise broadened to
include providing ongoing feedback for im-
Similar indictments of government ef- provements during implementation. Evalu-
fectiveness are the foundation for efforts ation cannot be considered t o have ful-
at Total Quality M a n a g e m e n t , Re-engi- filled its promise if, as is increasingly the
neering G o v e r n m e n t , or Reinventing case, the general perception is that few
G o v e r n m e n t . These and other manage- programs have attained desired outcomes,
ment innovations make new forms of ac- that "nothing works."
Such conclusions about programs raise use of program evaluations throughout

fundamental questions about the^role of Canada, as is the Australasian Evaluation
evaluation. Can evaluation contribute to Society in Australia and New Zealand (AES
increased program effectiveness? Can 1995; Sharp 1994; Caulley 1993; Funnell
^valuation be used to improve programs? 1993; Owen 1993; Sharp and Lindsay
Do evaluators bear any responsibility for 1992). European governments are rou-
use and program improvement? This book tinely using evaluation and policy analysis
will answer these questions in the affirm- too, although the nature, location, and re-
ative and offer utilization-focused evalu- sults of evaluation efforts vary from coun-
ation as an approach for realizing evalu- try to country (see, for example, Hooger-
ation's original vision of contributing to werf 1985; Patton 1985). International
long-term program effectiveness and im- agencies have also begun using evaluation'
proved decision making. to assess the full range of development
efforts under way in Third World coun-
tries. The World Bank, UNICEF, the Aus-
Worldwide Demand tralian Development Assistance Bureau
for Evaluation (1982), and the U.S. Agency for Interna-
tional Development are examples of inter-
The challenge to evaluation extends national development organizations with
well beyond government-supported pro- significant and active evaluation offices.
gramming. Because of the enormous size Global interest in evaluation culminated in
and importance of government efforts, the first-ever International Evaluation
program evaluation is inevitably affected Conference in Vancouver, Canada, in No-
by trends in the public sector, but evalu- vember 1995. With over 1,500 participants
ation has also been growing in importance from 61 countries, this conference made it
in the private and independent sectors (In- clear that evaluation had become a global
dependent Sector 1993). Corporations, challenge. In his keynote address to the
philanthropic foundations, and nonprofit conference, Masafumi Nagao (1995) from
agencies are increasingly turning to evalua- Japan's Sasakawa Peace Foundation chal-
tors for help in enhancing their organiza- lenged evaluators to think globally even as
tional effectiveness. they evaluate locally, that is, to consider
Nor is interest in empirically assessing how international forces and trends affect
policies and programs limited to the United project outcomes, even in small and remote
States. The federal government of Canada, communities. This book will include atten-
especially the Auditor General's Office, has tion to how utilization-focused evaluation
demonstrated a major commitment to con- offers a process for adapting evaluation
ducting program evaluations at both na- processes to address multicultural and in-
tional and provincial levels (Comptroller ternational issues and constituencies.
General of Canada 1989; Rutman and
Mayne 1985), and action-oriented evalu-
ation has emerged as an importance prac- Standards of Excellence
tice in many Canadian organizations for Evaluation
(Hudson, Mayne, and Thomlison 1992).
The Canadian Evaluation Society is active One major contribution of the profes-
in promoting the appropriate practice and sionalization of evaluation has been articu-
lation of standards for evaluation. The openly skeptical about spending scarce
standards make it clear that evaluations funds on evaluations they couldn't under-
ought to be useful. stand and/or found irrelevant. Evaluators
In the past many researchers took the were being asked to be "accountable," just
position that their responsibility was as program staff were supposed to be ac-
merely to design studies, collect data, and countable. The questions emerged with
publish findings; what decision makers did uncomfortable directness: Who will evalu-
with those findings was not their problem. ate the evaluators? How will evaluation
This stance removed from the evaluator be evaluated? It was in this context that
any responsibility for fostering use and professional evaluators began discussing ,.
placed all the blame for nonuse or under- standards.
utilization on decision makers. The most comprehensive effort at devel-
Academic aloofness from the messy oping standards was hammered out over
world in which research findings are trans- five years by a 17-member committee ap-
lated into action has long been a charac- pointed by 12 professional organizations,
teristic of basic scientific research. Before with input from hundreds of practicing
the field of evaluation identified and evaluation professionals. The standards
adopted its own standards, criteria for published by the Joint Committee on
judging evaluations could scarcely be dif- Standards in 1981 dramatically reflected
ferentiated from criteria for judging re- the ways in which the practice of evaluation
search in the traditional social and behav- had matured. Just prior to publication, Dan
ioral sciences, namely, technical quality and Stufflebeam (1980), chair of the commit-
methodological rigor. Use was ignored. tee, summarized the committee's work as
Methods decisions dominated the evalu- follows:
ation design process. Methodological rigor
meant experimental designs, quantitative
data, and sophisticated statistical analysis. The standards that will be published essen-
Whether decision makers understood such tially call for evaluations that have four
analyses was not the researcher's problem. features. These are utility, feasibility, propri-
Validity, reliability, measurability, and gen- ety, and accuracy. And I think it is interesting
eralizability were the dimensions that re- that the Joint Committee decided on that
ceived the greatest attention in judging particular order. Their rationale is that an
evaluation research proposals and reports evaluation should not be done at all if there
(e.g., Bernstein and Freeman 1975). In- is no prospect for its being useful to some
deed, evaluators concerned about increas- audience. Second, it should not be done if it
ing a study's usefulness often called for ever is not feasible to conduct it in political terms,
more methodologically rigorous evalu- or practicality terms, or cost-effectiveness
ations to increase the validity of findings, terms. Third, they do not think it should be
thereby supposedly compelling decision done if we cannot demonstrate that it will be
makers to take findings seriously. conducted fairly and ethically. Finally, if we
By the late 1970s, however, it was be- can demonstrate that an evaluation will have
coming clear that greater methodological utility, will be feasible, and will be proper in
rigor was not solving the use problem. its conduct, then they said we could turn to
Program staff and funders were becoming the difficult matters of the technical ade-
k
EXHIBIT 1.3
Standards for Evaluation
The Utility Standards are intended to ensure that an evaluation will serve the practical information
needs of intended users.
Feasibility
The Feasibility Standards are intended to ensure that an evaluation will be realistic, prudent,
diplomatic, and frugal.
Propriety
The Propriety Standards are intended to ensure that an evaluation will be conducted legally,
ethically, and with due regard for the welfare of those involved in the evaluation, as well as those
affected by its results.
Accuracy
The Accuracy Standards are intended to ensure that an evaluation will reveal and convey technically
adequate information about the features that determine worth or merit of the program being
evaluated.
SOURCE: Joint Committee 1994.
quacy of the evaluation, (p. 90; emphasis in racy-based evaluation requires situational
the original). responsiveness, methodological flexibil-
ity, multiple evaluator roles, political so-
In 1994, revised standards were pub- phistication, and substantial doses of
lished following an extensive review span- creativity, all elements of utilization-
ning several years (Joint Committee focused evaluation.
1994; Patton 1994a). While some changes
were made in the 30 individual standards,
the overarching framework of four pri- From Problem to Solution:
mary criteria remained unchanged: util- Toward Use in Practice
ity, feasibility, propriety, and accuracy
(see Exhibit 1.3). Taking the standards This chapter has reviewed the emer-
seriously has meant looking at the world gence of program evaluation as a profes-
quite differently. Unlike the traditionally sional field of practice with standards of
aloof stance of basic researchers, evalua- excellence and a mandate to be useful. The
tors are challenged to take responsibility early utilization crisis called into question
for use. No more can we play the game of whether the original hopes for evaluation
blame the resistant decision maker. Imple- would be, or even could be, realized. Utili-
mentation of a utility-focused, feasibility- zation-focused evaluation developed in re-
conscious, propriety-oriented, and accu- sponse to that crisis and as a way of fulfill-
/
ing, in practice, the mandate of the utility Note

standard. With this background as context,
we turn in the next chapter to an overview 1. For a full discussion of evaluation's eraer-
of utilization-focused evaluation. gence as both a discipline and a field of profes-
sional practice, see House (1993).
What Is Utilization-Focused Evaluation?
How Do You Get Started?
W hen I was a child, I spake as a child, I understood as a child. I thought as a

child: but when I became an adult, I put away childish things. I decided to
become an evaluator. My only problem was, I didn't have the foggiest idea what I was get-
ting into or how to begin. ,
—Halcolm
A modern version of an ancient Asian story (adapted from Shah 1964:64) casts light on
the challenge of searching for evaluation use.
A man found his neighbor down on his knees under a street lamp looking for
something. "What have you lost, friend?"
'"Alv key," replied the man on his knees.
.I'/CT .7 few minutes of helping him search, the neighbor asked, "Where did you
diop it?"
"In that dark pasture," answered his friend.
"Then why, for heaven's sake, are you looking here?"
"Because there is more light here."
19
The obvious place to look for use is in needed is a comprehensive framework

what happens after an evaluation is com- within which to develop and implement an
pleted and there's something to use. What evaluation with attention to use built in. In
we shall find, however, is that the search program evaluation, as in life, it is one's
for use takes us into the "dark pasture" of overall philosophy integrated into prag-
decisions made before any data are ever matic principles that provides a guide to
collected. The reader will find relatively action. Utilization-focused evaluation of-
little in this book about what to do when a fers both a philosophy of evaluation and a
study is over. At that point, the potential practical framework for designing and con-
for use has been largely determined. Utili- ducting evaluations.
zation-focused evaluation emphasizes that Since its original publication in 1978,
what happens from the very beginning of a Utilization-Focused Evaluation has been
study will determine its eventual impact tested and applied in thousands of evalu-
long before a final report is produced. ations in the United States and throughout
the world. This reservoir of experience
provides strong confirmation that evalu-
A Comprehensive Approach ations will be used if the foundation for use
is properly prepared. Evidence to that ef-
The question of how to enhance the use fect will be presented throughout this
of program evaluation is sufficiently com- book. First, let me outline the utilization-
plex that a piecemeal approach based on focused approach to evaluation and indi-
isolated prescriptions for practice is likely cate how it responds to the challenge of
to have only piecemeal impact. Over- getting evaluations used.
views of research on evaluation use (e.g.,
Huberman 1995; Lester and Wilds 1990;
Connor 1988; Greene 1988b; McLaughlin Utilization-Focused Evaluation
et al. 1988; M. F. Smith 1988; Cousins and
Leithwood 1986; Leviton and Hughes Utilization-Focused Evaluation begins
1981) suggest that the problems of under- with the premise that evaluations should be
use will not be solved by compiling and judged by their utility and actual use; there-
following some long list of evaluation axi- fore, evaluators should facilitate the evalu-
oms. It's like trying to live your life accord- ation process and design any evaluation
ing to Poor Richard's Almanac. At the mo- with careful consideration of how every-
ment of decision, you reach into your thing that is done, from beginning to end,
socialization and remember, "He who hesi- will affect use. Nor is use an abstraction.
tates is lost." But then again, "Fools rush Use concerns how real people in the real
in where angels fear to tread." Advice to world apply evaluation findings and expe-
young evaluators is no less confusing: "Work rience the evaluation process. Therefore,
closely with decision makers to establish the focus in utilization-focused evaluation
trust and rapport," but "maintain distance is on intended use by intended users.
to guarantee objectivity and neutrality." In any evaluation, there are many poten-
Real-world circumstances are too com- tial stakeholders and an array of possible
plex and unique to be routinely ap- uses. Utilization-focused evaluation re-
proached through the application of iso- quires moving from the general and ab-
lated pearls of evaluation wisdom. What is stract, that is, possible audiences and po-
What Is Utilization-Focused Evaluation? • 21
EXHIBIT 2.1
Guiding Principles for Evaluators
Systematic Inquiry
Evaluators conduct systematic, data-based inquiries about what is being evaluated.
Competence
Evaluators provide competent performance to stakeholders.
Integrity/Honesty
Evaluators ensure the honesty and integrity of the entire evaluation process.
Respect for People

Evaluators respect the security, dignity, and self-worth of the respondents, program participants,
clients, and other stakeholders with whom they interact.
Responsibilities for General and Public Welfare

Evaluators articulate and take into account the diversity of interests and values that may be related
to the general and public welfare.
SOURCE: American Evaluation Association Guiding Principles for Evaluators, Shadish et al. 1995.
tential uses, to the real and specific: actual requires negotiation: The evaluator offers
primary intended users and their explicit a menu of possibilities within the frame-
commitments to concrete, specific uses. work of established evaluation standards
The evaluator facilitates judgment and de- and principles. While concern about utility
cision making by intended users rather than drives a utilization-focused evaluation, the
acting as a distant, independent judge. evaluator must also attend to the evalu-
Since no evaluation can be value-free, ation's accuracy, feasibility, and propriety
utilization-focused evaluation answers the (Joint Committee on Standards 1994).
question of whose values will frame the Moreover, as a professional, the evaluator
evaluation by working with clearly identi- has a responsibility to act in accordance
fied, primary intended users who have re- with the profession's adopted principles of
sponsibility to apply evaluation findings conducting systematic, data-based inquir-
and implement recommendations. In es- ies; performing competently; ensuring the
sence, I shall argue, evaluation use is too honesty and integrity of the entire evalu-
important to be left to evaluators. ation process; respecting the people in-
Utilization-focused evaluation is highly volved in and affected by the evaluation;
personal and situational. The evaluation and being sensitive to the diversity of inter-
facilitator develops a working relationship ests and values that may be related to the
with intended users to help them determine general and public welfare (AEATask Force
what kind of evaluation they need. This 1995:20; see Exhibit 2.1).
11 M TOWARD MORE USEFUL EVALUATIONS
Utilization-focused evaluation does not to illustrate how the philosophy of utiliza-

advocate any particular evaluation content, tion-focused evaluation is translated into
model, method, theory, or even use. practice.
Rather, it is a process for helping primary
intended users select the most appropriate
content, model, methods, theory, and uses The First Challenge:
for their particular situation. Situational Engendering Commitment
responsiveness guides the interactive pro-
cess between evaluator and primary in- Utilization-focused evaluators begin
tended users. This book will present and their interactions with primary intended
discuss the many options now available in users by working to engender commit-
the feast that has become the field of evalu- ments to both evaluation and use. Even
ation. As we consider the rich and varied program funders and decision makers who
menu of evaluation, it will become clear request or mandate an evaluation often
that utilization-focused evaluation can in- don't know what evaluation involves, at
clude any evaluative purpose (formative, least not in any specific way. And they
summative, developmental), any kind of typically haven't thought much about how
data (quantitative, qualitative, mixed), any they will use either the process or the
kind of design (e.g., naturalistic, experi- findings.
mental), and any kind of focus (processes, In working with program staff and ad-
outcomes, impacts, costs, and cost-benefit, ministrators to lay the groundwork for an
among many possibilities). Utilization- evaluation, I often write the word evaluate
focused evaluation is a process for making on a flip chart and ask those present to free-
decisions about these issues in collabora- associate with the word. They typically
tion with an identified group of primary begin with synonyms or closely related
users focusing on their intended uses of terms: assess, measure, judge, rate, com-
evaluation. pare. Soon they move to connotations and
A psychology of use undergirds and in- feelings: waste, crap, cut our funding,
forms utilization-focused evaluation. In es- downsize, attack, demean, put down, pain,
sence, research and my own experience hurt, fear.
indicate that intended users are more likely Clearly, evaluation can evoke strong
to use evaluations if they understand and emotions, negative associations, and genu-
feel ownership of the evaluation process ine fear. To ignore the perceptions, past ex-
and findings; they are more likely to under- periences, and feelings stakeholders bring
stand and feel ownership if they've been to an evaluation is like ignoring a smolder-
actively involved; and by actively involving dynamite fuse in hope it will burn itself
ing primary intended users, the evaluator is out. More likely, unless someone inter-
training users in use, preparing the ground- venes and extinguishes the fuse, it will burn
work for use, and reinforcing the intended faster and eventually explode. Many an
utility of the evaluation every step along the evaluation has blown up in the face of
way. The rest of this chapter will offer some well-intentioned evaluators because they
ways of working with primary intended rushed into technical details and methods
users to begin the process of utilization- decisions without establishing a solid foun-
focused evaluation. Beyond the heuristic dation for the evaluation in clear purposes
value of these examples, they are meant and shared understandings. To begin, both
evaluators and those with whom we work tainment, then, takes too narrow a focus to
need to develop a shared definition of eval- encompass the variety of ways program
uation and mutual understanding about evaluation can be useful.
what the process will involve. Another common definition states that
evaluation determines the worth, merit, or
value of something (Joint Committee on
What Is Program Evaluation? Standards 1994; House 1993:1; Scriven
1991a: 139). This admittedly commonsen-
I offer the clients with whom I work the sical definition omits specifying the basis
following definition: for determining merit or worth (that is,
systematically collected data) or the pur-
Program evaluation is the systematic col- poses for making such a determination
lection of information about the activities, (program improvement, decision making,
characteristics, and outcomes of programs or knowledge generation). In advocating
to make judgments about the program, im- for this narrow and simple definition of
prove program effectiveness, and/or in evaluation, Stufflebeam (1994) warned
form decisions about future programming. against "obscuring the essence of evaluation
Utilization-focused program evaluation (as —to assess value—by overemphasizing its
opposed to program evaluation in general) is constructive uses" (p. 323). However, for
evaluation done for and with specific, in- me, use is the essence, so I choose to include
tended primary users for specific, intended it in my definition as a matter of emphasis
uses. to reinforce the point that concern about
use is a distinguishing characteristic of pro-
gram evaluation, even at the point of defin-
The general definition above has three in-
ing what program evaluation is. I'm not
terrelated components: (1) the systematic
interested in determining merit or worth as
collection of information about (2) a poten-
an end in itself. I want to keep before us the
tially broad range of topics (3) for a variety
questions: Why is merit or worth to be
of possible judgments and uses. The defini-
judged? What will be done with whatever
tion of utilization-focused evaluation adds
judgments are made?
the requirement to specify intended use by
intended users. A different approach is represented by
This matter of defining evaluation is of the widely used Rossi and Freeman (1993)
considerable import because different eval- textbook, Evaluation: A Systematic Ap-
uation approaches rest on different defini- proach. They define evaluation research as
tions. The use-oriented definition offered the systematic application of social re-
above contrasts in significant ways with search procedures in assessing social inter-
other approaches. One traditional ap- vention programs. But notice, they are de-
proach has been to define program evalu- fining evaluation research, and their text
ation as determining the extent to which a emphasizes applying social science meth-
program attains its goals. However, as we ods, so naturally they include that in their
shall see, program evaluation can and does definition of evaluation.
involve examining much more than goal The definition of evaluation I've offered
attainment, for example, implementation, here emphasizes systematic data collection
program processes, unanticipated conse- rather than applying social science meth-
quences, and long-term impacts. Goal at- ods. This is an important distinction in
emphasis, one in keeping with the Princi- boundaries of time, place, values, and poli-
ple of Systematic Inquiry adopted by the tics. The difference between research and
American Evaluation Association (AEA evaluation has been called by Cronbach
Task Force on Guiding Principles 1995:22). and Suppes (1969) the difference between
From my perspective, program evaluators conclusion-oriented and decision-oriented
may use research methods to gather infor- inquiry. Research aims to produce knowl-
mation, but they may also use manage- edge and truth. Useful evaluation supports
ment information system data, program action. The evaluation research of Rossi
monitoring statistics, or other forms of and Freeman is a hybrid that tends, in
sys tematic information that are not research- my reading of it, to be more knowledge-
oriented. Program evaluation differs funda- oriented than action-oriented.
mentally from research in the purpose of data Stake (1981) and Cronbach (1982) have
collection and standards for judging quality. emphasized that evaluation differs from
Basic scientific research is undertaken to dis- research in the relative importance at-
cover new knowledge, test theories, establish tached to making generalizations. In any
truth, and generalize across time and space. data collection effort, the extent to which
Program evaluation is undertaken to inform there is concern about utility, generalizabil-
decisions, clarify options, identify improve- ity, scientific rigor, and relevance of the
ments, and provide information about pro- findings to specific users will vary. Each of
grams and policies within contextual these dimensions is a continuum. Because
this book emphasizes meeting the informa- In short, how to define evaluation and
tion needs of specific intended users, the what to call a particular evaluation are
focus will most often be on program evalu- matters for discussion, clarification, and
ation rather than evaluation research. This negotiation.
focus derives from my work with small, What is not negotiable is that the evalu-
community-based programs where the idea ation be data-based. Both program evalu-
of conducting "research" may be intimidat- ation and evaluation research bring an em-
ing or where practitioners consider re- pirical perspective to bear on questions of
search "academic and irrelevant." On the policy and program effectiveness. This
other hand, national programs or those data-based approach to evaluation stands
staffed or funded by people with advanced in contrast to two alternative and often
degrees may attach positive associations to competing ways of assessing programs: the
conducting research, in which case they charity orientation and pure pork barrel
may prefer to call the process evaluation politics. I sometimes introduce these dis-
research. The language, like everything else tinctions in working with clients to help
in utilization-focused evaluation, depends them more fully appreciate the sine qua non
on the program context and the explicit nature of evaluation's commitment to sys-
needs and values of primary intended users. tematic data collection.
Charitable Assessment
^ f \ nd now abideth faith, hope, chfirity, these three; but the greatest of these is
^y % charity.
—Paul's First Letter to the Corinthians
Modern social service and education Sometimes religious motives can also be
programs are rooted in charitable and phil- found in this mix. As a United Way agency
anthropic motives: helping people. From a director once told me, "God has mandated
charity perspective, the main criterion for our helping the less fortunate, so God alone
evaluation is the sincerity of funders and will judge the outcomes and effectiveness
program staff; the primary measure of proof our efforts." The implication was that
gram worth is that the program organizers God needed no assistance from the likes of
care enough to try their very best to help social scientists, with their impersonal
the less fortunate. As an agency director statistics and objective analyses of human
told me after a measurement' training ses- suffering.
sion, "All I want to know is whether or not Data-oriented evaluators have little to
my staff are trying their best. When you've offer those who are fully ensconced in
got a valid and. reliable and all-that-other- charitable assessment. Others, however
stuff instrument for love and sincerity, (and their numbers are increasing), have
come back and see me." come to believe that, even for the sincer^,
indeed especially for the sincere and caring, evaluation findings are of interest only in-
empirically based program evaluations can sofar as they can be manipulated for po-
be valuable. After all, sincerity and caring litical and public relations purposes.
mean that one wants to do a good job, (Chapter 14 will address in more depth
wants to be effective, and wants to make a the relationship, often healthy when prop-
difference. The purpose of program evalu- erly approached, between politics and
ation is precisely that—to increase effec- evaluation.)
tiveness and provide information on
whether hopes are actually being realized.
People who really care about their work are Learning to Value Evaluation
precisely the people who can benefit
greatly from utilization-focused program So, we're working on engendering com-
evaluation. mitment to data-based evaluation and use.
We want to get beyond charitable assess-
ments and pork barrel assessments. Re-
Pork Barrel Assessment search on "readiness for evaluation" (D. S.
Smith 1992; Studer 1978; Mayer 1976,
A second historically important ap- 1975) has found that "valuing evaluation"
proach to evaluating programs has been is a necessary condition for evaluation use
pork barrel politics, which takes as its main (see Exhibit 2.2). Valuing evaluation can-
criterion the political power of a program's not be taken for granted. Nor does it hap-
constituency: If powerful constituents want pen naturally. Users' commitment to evalu-
the program, or if more is to be gained ation is typically fragile, often whimsical,
politically by support for, rather than op- and must be cultivated like a hybrid plant
position to, the program, then the program that has the potential for enormous yields,
is judged worthwhile; no other evidence of but only if properly cared for, nourished,
program effectiveness is needed, although and appropriately managed.
data may be sought to support this prede-
termined political judgment. Pork barrel
evaluations are one reason it is so difficult Reality Testing
to terminate government-funded programs
and agencies. Programs rapidly develop I find the idea of "reality testing" helpful
constituencies whose vested interests lie in in working with intended users to increase
program continuation. The driving force of the value they attach to evaluation and,
the pork barrel approach is to give out correspondingly, their willingness to be ac-
money where it counts politically, not tively engaged in the work necessary to
where it will be used most effectively. make the evaluation useful. I include in the
The pork barrel criterion is not unique notion of testing reality gathering varying
to elected politicians and governmental perceptions of reality in line with the axiom
bodies. The funding boards of philan- that "what is perceived as real is real in its
thropic foundations) corporate boards, and consequences."1 The phrase "reality test-
service agencies have their own constituen- ing" implies that being "in touch with real-
cies to please. Political debts must be paid, ity" can't simply be assumed. When indi-
so programs are judged effective as long as viduals lose touch with reality, they become
they serve powerful interests. Empirical dysfunctional, and, if the distortions of re-
EXHIBIT 2.2
Items on Belief in Program Evaluation,
From Readiness for Evaluation Questionnaire
Rank Order Factor

by Factor Item Loading
1 Program evaluation would pave the way for better programs for our clientele .777
2 This would be a good time to begin (or renew or intensify) work on program
evaluation .732
3 Installing a procedure for program evaluation would enhance the stature of
our organization .723
4 We don't need to have our program evaluated -.689
5 The amount of resistance in the organization to program evaluation should
not be a deterrent to pursuing a policy of program evaluation .688
6 I have yet to be convinced of the alleged benefits of program evaluation -.669
7 Program evaluation would only increase the workload -.668
8 "Program evaluation" and "accountability" are just fads that hopefully will die
down soon -.650
9 Program evaluation would tell me nothing more than I already know -.645
10 I would be willing to commit at least 5% of the program budget for evaluation .624
11 A formal program evaluation would make it easier to convince administrators
of needed changes .617
12 We could probably get additional or renewed funding if we carry out a plan for
program evaluation .587
13 Program evaluation might lead to greater recognition and rewards to those
who deserve it .548
14 It would be difficult to implement a procedure for program evaluation without
seriously disrupting other activities -.518
15 No additional time and money can be made available for program evaluation -.450
16 Most of the objections one hears about program evaluation are really pretty
irrational .442
17 Some money could probably be made available to provide training to staff in
program evaluation skills .385
SOURCE: Smith 1992:53-54).

NOTE: Factor analysis is a statistical technique for identifying questionnaire or test items that are highly intercorrelated and
therefore may measure the same factor, in this case, belief in evaluation. The positive or negative signs on the factor loadings
reflect whether questions were worded positively or negatively; the higher a factor loading, the better the item defines the factor.
EXHIBIT 2.3
Reality Testing: Example of
a Good Idea That Didn't Work Out
The Robert Wood Johnson Foundation funded an eight-year effort to establish and evaluate new ways
of helping doctors and patients deal with death in hospitals. Called SUPPORT (Study to Understand
Prognoses and Preferences for Outcomes and Risks of Treatment), the project placed nurses in five
teaching hospitals to facilitate communications between physicians and families facing the death of
a family member. The idea was that by increasing doctors' understanding of what patients and their
families wanted and didn't want, pain could be diminished, the appropriateness of care would increase,
and fewer "heroic measures" would be used to prolong life for short periods.
The evaluation found that the culture of denial about death could not be overcome through better
communication. Major gaps remained between what patients privately said they wanted and what
doctors, dedicated to saving lives, did. Living wills didn't help. Half the patients still died in pain. Many
died attached to machines, and died alone.
Dr. Joanne Lynn, a co-director of the project, expressed dramatically the importance of testing
good ideas in practice to see if they really work: "We did what everyone thought would work and it
didn't work at all, not even a quiver."
While the idea didn't work, important lessons were learned, she concluded. "This wasn't a group
of doctors dedicated to finding the last possible date on the tombstone. What we learned was that the
conspiracy of silence about death was stronger than we expected, and the force of habit was also
stronger than we expected. We are all involved in the dance of silence." ,
NOTE: Quotations attributed to Dr. Lynn are taken from Goodman 1995:17A.
ality are severe, they may be referred for liefs. Evaluation is a threat to such people.
psychotherapy. Programs and organiza- Evaluators who ignore the threatening
tions can also "lose touch with reality" in nature of reality testing and plow ahead
the sense that the people in those programs with their data collection in the hope
and organizations are operating on myths that knowledge will prevail are engaged
and behaving in ways that are dysfunctional in their own form of reality distortion.
to goal attainment and ineffective for ac- Utilization-focused evaluators, in contrast,
complishing desired outcomes. Program work with intended evaluation users to
evaluation can be a mechanism for finding help them understand the value of reality
out whether what's supposed to be or testing and buy into the process, thereby
hoped to be going on is, in fact, going reducing the threat of evaluation and resis-
on—a form of reality testing. tance (conscious or unconscious) to evalu-
Some people would just as soon not be ation use. One way to do this is to look for
bothered dealing with programmatic or or- and use examples from the news of good
ganizational reality. They've constructed ideas that haven't worked out. Exhibit 2.3
their own comfortable worlds built on un- presents an example I've used with several
tested assumptions and unexamined be- groups.
As I work with intended users to agree Because evaluation use is so dependent

on what we mean by evaluation and engen- on the commitment to reality testing,
der a commitment to use, I invite them to evaluators need ways to cultivate that com-
assess incentives for and barriers to reality mitment and enlarge the capacity of in-
testing and information use in their own tended users to undertake the process. This
program culture. Barriers typically include means engaging program staff, managers,
fear of being judged, cynicism about funders, and other intended users in
whether anything can really change, skep- examining how their beliefs about program
ticism about the worth of evaluation, con- effectiveness may be based on selective per-
cern about the time and money costs of ception, predisposition, prejudice, rose-
evaluation, and frustration from previous colored glasses, unconfirmed assertions, or
bad evaluation experiences, especially lack simple misinformation. The irony of living
of use. As we work through these and in the information age is that we are sur-
related issues to "get ready for evaluation," rounded by so much misinformation and
the foundation for use is being built in act on so many untested assumptions. By
conjunction with a commitment to serious putting intended users in touch with how
and genuine reality testing. Because evalua- little they really know, and how flimsy is
the basis for much of what they think they
tors have typically internalized the value of
know, we are laying the groundwork for
data-based reality testing, it is easy to as-
use. We are, in fact, identifying that there
sume that others share this perspective. But
are useful things to be found out and cre-
a commitment to examine beliefs and test
ating the expectation that testing reality
actual goal attainment is neither natural
will be a valuable activity, not just an aca-
nor widespread. People involved in pro-
demic or mandated exercise. In short, we
gram management and service delivery can
are establishing the program's readiness for
become quite complacent about what
evaluation.
they're doing and quite content with the
way things are. Reality testing will only
ûpset things. "Why bother?" they ask.
Generating Real Questions
Nor is it enough that an evaluation is
required by some funder or oversight One way of facilitating a program's
authority. Indeed, under such conditions, readiness for evaluation is to take primary
evaluation often becomes an end in itself, intended users through a process of gener-
something to be done because it is man- ating meaningful evaluation questions. I
dated, not because it will be useful or be- find that when I enter a new program
cause important things can be learned. Do- setting as an external evaluator, the people
ing an evaluation because it is required is with whom I'm working typically expect
entirely different from doing it because one me to tell them what the focus of the
is committed to grounding decisions and evaluation will be. They're passively wait-
action in a careful assessment of reality. ing to be told by the evaluation expert—
Ironically, mandated evaluations can actu- me—what questions the evaluation will an-
ally undercut utility by making the reason swer. But I don't come with specific
for the evaluation compliance with a fund- evaluation questions. I come with a process
ing requirement rather than genuine inter- for generating their questions. Taking them
est in being more effective. through that process is aimed at engender-
ing their commitment to data-based evalu- I again replied that it was too early to
ation and use. Let me share an example. talk about instruments. First, we had to
The Frontier School Division in Mani- identify their evaluation questions and con-
toba, Canada, encompasses much of north- cerns. Then we would talk about instru-
ern Manitoba—a geographically immense ments. However, their folded arms and
school district. The Deputy Minister of scowling faces told me that what they in-
Education in Manitoba thought evaluation terpreted as my evasiveness was only inten-
might be a way to shake things up in a sifying their initial suspicions and fears. I
district he considered stagnant, so he asked was deepening their resistance by what they
me to facilitate an evaluation process with perceived as my secretiveness about the
content of my evaluation scheme. The su-
district officials. The actual form and con-
perintendent tried again: "How about just
tent of the evaluation were to be deter-
showing us one part of the evaluation, say
mined internally, by them. So I went up to
the part that asks teachers about adminis-
Winnipeg and met with the division admin-
trative effectiveness."
istrators, a representative from the parents'
group, a representative from the principals' At that point I was about to throw in the
group, and a representative from the teach- towel, give them some old instruments, and
ers' union. I had asked that all constituen- let them use what they wanted from other
evaluations. But first, I made one more
cies be represented in order to establish
attempt to get at their issues. I said, "Look,
credibility with all the people who might be
maybe your questions will be the same as
involved in using the evaluation.
questions I've used on surveys elsewhere.
Inasmuch as I had been brought in from But I'm not even sure at this point that any
outside by a superordinate official, it was kind of survey is appropriate. Maybe you
not surprising that I encountered reactions don't need an evaluation. I certainly don't
ranging from defensiveness to outright hos- have any questions I need answered about
tility. They had not asked for the evalu- your operations and effectiveness. Maybe
ation, and the whole idea sounded unsa- you don't either. In which case, I'll tell the
vory and threatening. Deputy Minister that evaluation isn't the
I began by asking them to tell me what way to go. But before we decide to quit, let
kinds of things they were interested in me ask you to participate in a simple little
evaluating. The superintendent frowned exercise. It's an old complete-the-blank ex-
and responded, "We'd like to see the evalu- ercise from grade school." I then turned to
ation instruments you've used in assessing the chalkboard and wrote a sentence in
other school districts." capital letters.
I replied that I would be happy to share
such instruments if they should prove rele- I WOULD REALLY LIKE TO KNOW
vant, but it would be helpful to first deter- ABOUT FRONTIER
mine the evaluation issues and priorities of SCHOOL DIVISION.
Frontier School Division. They looked
skeptical, and after a lingering silence, the I turned back to them and continued,
superintendent tried again: "You don't "I want to ask each of you, individually,
need to show us all the instruments you to complete the blank 10 times. What are
intend to use. Just show us one so we have 10 things about Frontier School Division
an idea of what's going to happen." that you'd like to know, things you aren't
certain about, that would make a differ- Another question concerned the rela-
ence in what you do if you had more tionship between the classroom and the
information? Take a shot at it, without community. Both the teacher and parent
regard to methods, measurement, design, representatives said that nobody had ever
resources, precision—just 10 basic ques- thought about that in any real way: "We
tions, real questions about this division." don't have any policy about that. We don't
After about 10 minutes I divided the know what goes on in the different schools.
participants into three groups of four peo- That would be important for us to know."
ple each and asked them to combine their We spent the rest of the day refining
lists into a single list of 10 things that each questions, prioritizing, formalizing evalu-
group wanted to know—in effect, to estab- ation procedures, and establishing an
lish each group's priority questions. Then agenda for the evaluation process. The hos-
we pulled back together and generated a tility had vanished. By the end of the day
single list of 10 basic evaluation questions they were anxious to have me make a com-
—answers to which, they agreed, could mitment to return. They had become ex-
make a real difference to the operations of cited about doing their evaluation. The
Frontier School Division. evaluation had credibility because the ques-
The questions they generated were the tions were their questions. A month later,
kind an experienced evaluator could antici- they found out that budget shifts in the
pate being asked in a districtwide educa- Ministry meant that the central govern-
tional evaluation because there are only so ment would not pay for the evaluation. The
many things one can ask about a school Deputy Minister told them that they could
division. But the questions were phrased in scrap the evaluation if they wanted to, but
their terms, incorporating important local they decided to pay for it out of local
nuances of meaning and circumstance. division funds.
Most important, they had discovered that The evaluation was completed in close
they had questions they cared about—not cooperation with the task force at every
my questions but their questions, because step along the way. The results were dis-
during the course of the exercise it had seminated to all principals and teachers.
become their evaluation. The whole atmos- The conclusions and recommendations
phere had changed. This became most evi- formed the basis for staff development con-
dent as I read aloud the final list of 10 items ferences and division policy sessions. The
they had generated that morning. One item evaluation process itself had an impact on
read, "How do teachers view the effective- the Division. Over the last several years,
ness of administrators and how often do Frontier School Division has gone through
they think administrators ought to come many changes. It is a very different place in
into classrooms?" One of the administra- terms of direction, morale, and activity
tors who had been most hostile at the outset than it was on my first visit. Not all those
said, "That would be dynamite informa- changes were touched on in the evaluation,
tion. We have no idea at all what teachers nor are they simply a consequence of the
think about us and what we do. I have no evaluation. But generating a list of real and
idea if they want me in their classrooms or meaningful evaluation questions played a
not, or how often they think I ought to visit. critical part in getting things started. Ex-
That could turn my job around. That hibit 2.4 offers criteria for good utilization-
would be great to know." focused questions.
EXHIBIT 2.4
Criteria for Utilization-Focused Evaluation Questions
1. Data can be brought to bear on the question; that is, it is truly an empirical question.
2. There is more than one possible answer to the question; that is, the answer is not predetermined by the
phrasing of the question.
3. The primary intended users want information to help answer the question. They care about the answer to
the question.
4. The primary users want to answer the question for themselves, not just for someone else.
5. The intended users can indicate how they would use the answer to the question; that is, they can specify
the relevance of an answer to the question for future action.
Communicating Professional explore how this "baggage" they've

Commitment to Use From the brought with them may affect their expec-
Beginning tations about the evaluation's likely utility.
As we work toward a shared definition of
The criterion I offered the primary inevaluation and a clear commitment to use,
tended users in Winnipeg for generating I look for opportunities to review the de-
meaningful questions was "Things you'd velopment of program evaluation as a field
like to know that would make a difference of professional practice and present the
to what you do." This criterion emphasizes standards for and principles of evaluation
knowledge for action—finding out things (see the index). This material, presented
that can be used. But generating a list of earlier in this chapter and in Chapter 1,
potentially useful questions is only one way communicates to primary intended users
to start interacting with primary users. that you, as the evaluator, are a profes-
How one begins depends on what back- sional—part of an established profession—
grounds, experiences, preconceptions, and and that, as such, you have an obligation to
relationships the primary users bring to the facilitate and conduct evaluations in ac-
table. In Winnipeg, I needed to get the cordance with professional standards and
group engaged quickly in reframing how principles, including priority attention to
they were thinking about my role because utility.
their resistance was so palpable and be- Few non-evaluators are aware of the
cause we didn't have much time. field's professional associations, confer-
With a seemingly more neutral group, ences, journals, standards, and principles.
one that is neither overtly hostile nor en- By associating your effort with the larger
thusiastic (Yes, some groups are actually profession, you can elevate the status, seri-
enthusiastic at the beginning!), I may begin, ousness, and meaningfulness of the process
as I noted earlier in this chapter, by asking you are facilitating and help the primary
participants to share words and feelings intended users understand the sources of
they associate with evaluation. Then, we wisdom you are drawing oh and applying
EXHIBIT 2.5
Themes of Annual American
Evaluation Association National Conferences
1986 What Have We Learned?

1987 The Utilization of Evaluation
1988 Evaluation and Politics
1989 International and Cross-Cultural Perspectives
1990 Evaluation and Formulation of Public Policy
1991 New Evaluation Horizons
1992 Synthesizing Evaluation: Perspectives, Practices, and Evidence
1993 Empowerment Evaluation
1994 Evaluation and Social Justice
1995 Evaluation for a New Century: A Global Perspective
1996 A Decade of Progress: Looking Back and Looking Forward
1997 Evaluation Theory Informing Practice, Practice Informing Theory
as you urge them to attend carefully to this so important that I have students prac-
utilization issues from the start. Thus, the tice 10-minute minilectures on the devel-
history of the profession presented in the opment of evaluation as a field of profes-
first chapter can be shared with intended sional practice, one guided by standards
users to communicate the larger context and principles (see Exhibits 1.3 and 2.1), so
within which any particular evaluation they can hold forth at a moment's notice,
takes place and to show sophistication whether the opportunity be a workshop or
about the issues the profession has focused a cocktail party.
on over time (see Exhibit 2.5). I consider
Creative Beginnings
M—-X uthors of all races, be they Greeks, Romans, Teutons, or Celts, can't seem just
^y % to say that anything is the thing it is; they have to go out of their way to say
that it is like something else.
—Ogden Nash
With easy-going, relaxed groups that begin with a metaphor exercise. Meta-
seem open to having some fun, I'll often phors, similes, and analogies help us make
connections between seemingly uncon- to ask them to construct metaphors and

nected things, thereby opening up new pos- similes about evaluation. This exercise
sibilities by unveiling what had been unde- helps participants in the process discover
tected. Bill Gephart (1981), in his 1980 their own values concerning evaluation
presidential address to evaluators, drew an while also giving them a mechanism to
analogy between his work as a watercolor communicate those values to others. T h e
artist and his w o r k as an evaluator. Gephart exercise can be used with a program staff,
compared the artist's efforts to "compel the an evaluation task force, evaluation train-
eye" to the evaluator's efforts to "compel ees, workshop participants, or any group
the mind." Both artist and evaluator at- for w h o m it might be helpful to clarify and
tempt to focus the attention of an audience share perceptions about evaluation. T h e
by highlighting some things and keeping exercise goes like this.
other things in the background. H e also
examined the ways in which the values of One of the things that we'll need to do during
an audience (of art critics or program deci- the process of working together is come to
sion makers) affect what they see in a fin- some basic understandings about what evalu-
ished piece of work. ation is and can do. In my experience,
Nick Smith (1981) directed a Research evaluation can be a very creative and ener-
on Evaluation Program in which he and gizing experience. In particular, interpreting
others thought about evaluators as poets, and using evaluation findings for program
architects, photographers, philosophers, improvement requires creativity and open-
operations analysts, and artists. They con- ness to a variety of possibilities. To help us
sciously and creatively used metaphors and get started on this creative endeavor, I'm
analogies to understand and elaborate the going to ask you to participate with me in a
many functions of program evaluation. Use little exercise.
of these forms of figurative speech can help In this box I have a bunch of toys, house-
evaluators communicate the nature and hold articles, office supplies, tools, and other
practice of evaluation. M a n y of the prob- miscellaneous gadgets and thingamajigs that
lems encountered by evaluators, much of I've gathered from around my house. I'm
the resistance to evaluation, and many fail- going to dump these in the middle of the
ures of use occur because of misunder- table and ask each of you to take one of them
standings and communications problems. and use that item to make a statement about
What we often have, between evaluators evaluation. Evaluation is like
and non-evaluators, is a classic "failure to because . . .
communicate."
One reason for such failures is that the T o illustrate what I w a n t people to d o ,
language of research and evaluation—the I offer to go first. I ask someone t o pick
jargon—is alien to many laypersons, deci- out any object in the r o o m that I might use
sion makers, and stakeholders. From my for my metaphor. W h a t follows are some
point of view, the burden for clear commu- examples from actual workshops:
nications rests on the evaluator. It is the
evaluator w h o must find ways of bridging Someone points to a coffee cup: "This cup
the communications gap. can be used to hold a variety of things. The
To help intended users and stakeholders actual contents of the cup will vary depend-
understand the nature of evaluation, I like ing on who is using it and for what purpose
they are using it. Utilization-focused evaluation statement. Below are some examples
ation is a process like this cup; it provides a from actual w o r k s h o p s .
form but is empty until the group of people
working on the evaluation fill it with focus
This empty grocery bag is symbolic of my
and content and substance. The potential of
feelings about evaluation. When I think
the cup cannot be realized until it holds some
about our program being evaluated, I want
liquid. The potential of utilization-focused
to find someplace to hide, and I can put this
evaluation cannot be realized until it is given empty bag over my head so that nobody can
the substance of a concrete evaluation prob- see me and I can't see anything else, and it
lem and situation. One of the things that I'll gives me at least the feeling that I'm able to
be doing as we work together is providing an hide. (She puts the bag over her head.)
evaluation framework like this cup. You will
provide the substance."
Evaluation can be like this toothbrush. When
used properly it gets out the particles be-
Someone points to a chalkboard: "Evalu- tween the teeth so they don't decay. If not
ation is like a chalkboard because both are used properly, if it just lightly goes over the
tools that can be used to express a variety of teeth or doesn't cover all the teeth, then
different things. The chalkboard itself is just some of the gunk will stay on and cause the
an empty piece of slate until someone writes teeth to decay. Evaluation should help get rid
on it and provides information and meaning of any things that are causing a program to
by filling in that space. The chalkboard can decay so it stays healthy.
be filled up with meaningless figures, ran-
dom marks, obscene words, mathematical
Evaluation for me is like this rubber ball.
formulas, or political graffiti—or the board
You throw it down and it comes right back
can be filled with meaningful information,
at you. Every time I say to my staff we
insights, helpful suggestions, and basic facts.
ought to evaluate the program, they throw it
The people who write on the chalkboard
right back at me and they say, "you do the
carry the responsibility for what it says. The
evaluation."
people who fill in the blanks in the evaluation
and determine its content and substance
carry the responsibility for what the evalu- Evaluation is like this camera. It lets you take
ation says. The evaluation process is just a a picture of what's going on, but it can only
tool to be used—and how it is used will de- capture what you point it at, and only at a
pend on the people who control the process particular point in time. My concern about
—in this case, you." this evaluation is that it won't give the whole
picture, that an awful lot may get left out.
I'll typically take a break at this point
and give people about 10 minutes to select Evaluation for me is like this empty enve-
an item and think about w h a t to say. If lope. You can use it to send a message to
there are more than 10 people in the someone. I want to use evaluation to send a
group, I will break the larger group into message to our funders about what we're
small groups of 5 or 6 for sharing analo- doing in the program. They don't have any
gies and metaphors so that each person is idea about what we actually do. I just hope
given an opportunity to make an evalu- they'll read the letter when they get it.
Evaluation for me is like this adjustable one who couldn't find an object to use in
wrench. You can use this wrench to tighten saying something about evaluation. One
nuts and bolts to help hold things together. way of guaranteeing this is to include in
If used properly and applied with the right your box of items some things that have a
amount of pressure, it holds things together pretty clear and simple message. For exam-
very well. If you tighten the bolt too hard, ple, I'll always include a lock and key so
however, you can break the bolt, and the that a very simple and fairly obvious anal-
whole thing will fall apart. I'm in favor of ogy can be made: "Evaluation is like a lock
evaluation if it's done right. My concern is and key, if you have the right key you can
that you can overdo it and the program can't open up the lock and make it work. If you
handle it. have the right information you can make
the thing work." Or I'll include a lightbulb
The process of sharing is usually ac- so that someone can say "evaluation is like
companied by laughter and spontaneous this lightbulb, it's purpose is to shed light
elaborations of favorite metaphors. It's a on the situation."
fun process that offers hope the evalu-
ation process itself may not be quite as
painful as people thought it would be. In The Cutting Edge of Metaphors
addition, participants are often surprised
to find that they have something to say. Metaphors can open up new under-
They are typically quite pleased with standings and enhance communications.
themselves. Most important, the exercise They can also distort and offend. At the
serves to express important thoughts and 1979 meeting of the Midwest Sociological
feelings that can be dealt with once they Society, well-known sociologist Morris
are made explicit. Janowitz was asked to participate in a panel
Participants are typically not even aware on the question "What is the cutting edge
that they have these feelings. By providing of sociology?" Janowitz (1979), having
a vehicle for discovering and expressing written extensively on the sociology of the
their concerns, it is possible to surface ma- military, took offense at the "cutting edge"
jor issues that may later affect evaluation metaphor. He explained, " 'Cutting edge'
use. Shared metaphors can help establish a is a military term. I am put off by the very
common framework for the evaluation, term, cutting edge, like the parallel term
capturing its purpose, its possibilities, and breakthrough: slogans which intellectuals
the safeguards that need to be built into the have inherited from the managers of vio-
process. Robert Frost once observed, "All lence" (p. 601).
thought is a feat of association: Having Strategic planning is a label with military
what's in front of you bring up something origins and connotations, as is rapid recon-
in your mind that you almost didn't know naissance, a phrase sometimes used to de-
you knew." This exercise helps participants scribe certain quick, exploratory evalu-
bring to mind things about evaluation they ation efforts. Some stakeholder groups will
almost didn't know they knew. object to such associations; others will re-
By the way, I've used this exercise with late positively. Evaluators, therefore, must
many different groups and in many differ- be sensitive in their selection of metaphors
ent situations, including cross-cultural set- to avoid offensive comparisons and match
tings, and I've never yet encountered some- analogies to stakeholders' interests. Of par-
ticular importance, in this regard, is avoid- smorgasbord banquet styles of teaching/

ing the use of metaphors with possible learning/assessing. Many new metaphors are
racist and sexist connotations, for example, needed as we seek clarity in our search for
"It's black and white" or "We want t o get better ways of evaluating. To deal with diver-
inside the Black Box of evaluation." sity is to look for new metaphors. (Hurry
As Minnich (1990) has observed in her 1976)
important book, Transforming Knowledge,
our language and thinking can perpetuate As we look for new m e t a p h o r s in
"the old exclusions and devaluations of the evaluation, we w o u l d d o well to d o so
majority of humankind that have pervaded in the spirit of T h o r e a u , w h o observed,
our informal as well as formal schooling" "All perception of truth is the detec-
(p. 1). She observed further that tion of an analogy." T h e added point for
utilization-focused evaluators is the ad-
even when we are all speaking the same monition to be sensitive in selecting meta-
languages, there are many "languages" at phors that are meaningful to specific in-
play behind and within what the speakers tended users. T h e importance of such
mean and what we in turn understand . . . , sensitivity stems from the centrality of
levels and levels of different meanings in "the personal factor" in evaluation use,
even the most apparently simple and acces- the subject of the next chapter. First, how-
sible utterance, (p. 9) ever, a closing m e t a p h o r .
Minnich's point was nicely illustrated

at a conference on educational evalua-
tion w h e r e a W o m e n ' s Caucus formed Navigating Evaluation's
to express concerns about the analogies Rough Seas
used in evaluation and to suggest some
alternatives. A common error made by novice evalua-
tors is believing that because someone has
To deal with diversity is to look for new requested an evaluation or some group has
metaphors. We need no new weapons of been assembled to design an evaluation, the
assessment—the violence has already been commitment to reality testing and use is
done! How about brooms to sweep away the already there. Quite the contrary, these
attic-y cobwebs of our male/female stereo- commitments must be engendered (or revi-
types? The tests and assessment techniques talized if once they were present) and then
we frequently use are full of them. How reinforced throughout the evaluation pro-
about knives, forks, and spoons to sample the cess. Utilization-focused evaluation makes
feast of human diversity in all its richness and this a priority.
color? Where are the techniques that assess It's all too easy for those of us trained in
the deliciousness of response variety, inde- research methods to forget that "evaluation
pendence of thought, originality, unique- is an unnatural act." (Buttons and bumper
ness? (And lest you think those are female stickers with this slogan evoke interesting
metaphors, let me do away with that myth— responses from intended users.) Evaluation
at our house everybody sweeps and every- is not natural to managers, funders, policy-
body eats!) Our group talked about another makers, program staff, or program par-
metaphor—the cafeteria line versus the ticipants. That's why they need profes-
sional assistance, support, training, and matic disasters and solidly documentable
facilitation. successes, and an abundance of ambiguity
Utilization-focused evaluation offers a between these poles of the h u m a n experi-
philosophical harbor to sail toward when ence. The voyage is w o r t h taking, despite
the often rough and stormy seas of evalu- the dangers and difficulties, because the
ation threaten to blow the evaluator off potential rewards include making a mean-
course. With each new evaluation, the ingful difference in the effectiveness of im-
evaluator sets out, like an ancient explorer, portant programs and thereby improving
on a quest for useful knowledge, not sure the quality of people's lives. T h a t only
whether seas will be gentle, tempestuous, happens, however, if the evaluation process
or becalmed. Along the way the evaluator and findings are used.
will often encounter any number of chal-
lenges: political intrigues w r a p p e d in man-
tles of virtue; devious and flattering an-
tagonists trying to co-opt the evaluation in Note
service of their own narrow interests and
agendas; unrealistic deadlines and absurdly 1. I want to emphasize that I am using the
limited resources; gross misconceptions term reality testing in its commonsense conno-
about what can actually be measured with tation of finding out what is happening. While
precision and definitiveness; deep-seated philosophers of science will rightly point out
fears about the evils-incarnate of evalu- that the whole notion of reality is an episte-
ation, and therefore, evaluators; incredible mological quagmire, I find that the people I
exaggerations of evaluators' power; and work with in the "real world"—their phrase—
insinuations about defects in the evalu- resonate to the notion of reality testing. It is
ator's genetic heritage. The observant their own sense of reality I want to help them
evaluator is also likely to encounter tre- test, not some absolute, positivist construct of
mendously dedicated staff working under reality. The notion that reality is socially con-
difficult conditions for pitiable wages; pro- structed doesn't mean it can't be tested and
gram participants w h o have suffered griev- understood. At the 1995 International Evalu-
ous misfortunes and whose lives seem to ation Conference in Vancouver, Ernie House,
hang by the most fragile of threads; admin- Will Shadish, Michael Scriven, and I (evalua-
istrators working feverishly to balance in- tion theorists with quite different perspectives)
credible needs against meager resources; participated in a session on theory in which we
funders and policymakers struggling to agreed on the following two propositions, among
make sensible and rational decisions in a others: (1) Most theorists postulate a real physi-
world that often seems void of sense and cal world, although they differ greatly as to its
reason. The seas of evaluation offer en- knowability and complexity; and (2) logical
counters with discouraging corruption and positivism is an inadequate epistemology that
few theorists advocate any more, either in evalu-
inspiring virtue, great suffering and hope-
ation or philosophy.
ful achievements, unmitigated program-
Fostering Intended Use by
Intended Users
The Personal Factor
IT I here are five key variables that are absolutely critical in evaluation use. They - -
\*^ are, in order of importance: people, people, people, people, and people.
—Halcolm
On a damp summer morning at Snow Mountain Ranch near Rocky Mountain National
Park, some 40 human service and education professionals have gathered from all over the
country in a small, dome-shaped chapel to participate in an evaluation workshop. The
session begins like this:
Instead of beginning by my haranguing you about what you should do in program

evaluation, n.e're going to begin with an evaluation exercise to immerse us immediately
in the process. I'm going to ask you to play the dual roles of participants and evaluators
since that '< the situation most of you find yourselves in anyway in your own agencies
and programs, where you have both program and evaluation responsibilities. We're
going to share an experience to loosen things up a bit. . . perhaps warm you up, wake
you up, and allow you to get more comfortable. The exercise will also allow us to test
your participant observer skills and provide us with a common experience as evaluators.
We'll also generate some personal data about the process of evaluation that we can use
for discussion later.
39
So what I want you to do for the next five minutes is move around this space in any
way you want to. Explore this environment. Touch and move things. Experience
different parts of this lovely setting. And while you're observing the physical environ-
ment, watch what others do. Then, find a place where you feel comfortabl&io write
down what you observed, and also to evaluate the exercise. Experience, explore, observe,
and evaluate. That's the exercise.
At the end of the writing time, participants shared, on a voluntary basis, what they had
itten.
written
First Observer: People slowly got up. Everybody looked kind of nervous 'cause they
weren't sure what to do. People moved out toward the walls, which
are made of rough wood. The lighting is kind of dim. People sort of
moved counterclockwise. Every so often there would be a nervous
smile exchanged between people. The chairs are fastened down in
rows so it's hard for people to move in the center of the room. A
few people went to the stage area, but most stayed toward the back
and outer part. The chairs aren't too comfortable, but it's a quiet,
mellow room. The exercise showed that people are nervous when
they don't know what to do.
Second Observer: The room is hexagon-shaped with a dome-shaped ceiling. Fastened-
down chairs are arranged in a semicircle with a stage in front that
is about a foot high. A podium is at the left of the small stage. Green
drapes hang at the side. Windows are small and triangular. The floor
is wood. There's a coffee table in back. Most people went to get
coffee. A couple people broke the talking rule for a minute. Every-
one returned to about the same place they had been before after
walking around. It's not a great room for a workshop, but it's OK.
Third Observer: People were really nervous about what to do because the goals of
the exercise weren't clear. You can't evaluate without clear goals so
people just wandered around. The exercise shows you can't evaluate
without clear goals.
Fourth Observer: I said to myself at the start, this is a human relations thing to get us
started. I was kind of mad about doing this because we've been here
a half hour already, and we haven't done anything that has to do
with evaluation. I came to learn about evaluation, not to do touchy-
feely group stuff. So I just went to get coffee. I didn't like wasting
so much time on this.
Fostering Intended Use by Intended Users • 41
Fifth Observer: I felt uneasy, but I told myself that it's natural to feel uneasy when
you aren't sure what to do. But I liked walking around, looking at
the chapel, and feeling the space. I think some people got into it,
but we were stiff and uneasy. People avoided looking at each other.
Sometimes there was a nervous smile when people passed each
other, but by kind of moving in a circle, most people went the same
direction and avoided looking at each other. I think I learned
something about myself and how I react to a strange, nervous
situation.
These observations were followed by a will decide these issues? The utilization-
discussion of the different perspectives focused answer is: primary intended users
reported on the same experience and of the evaluation.
speculation on what it would take to pro- Clearly and explicitly identifying peo-
duce a more focused set of observations ple who can benefit from an evaluation is
and evaluations. Suggestions included es- so important that evaluators have adopted
tablishing clear goals; specifying evalu- a special term for potential evaluation
ation criteria; figuring out what was sup- users: stakeholders. This term has been bor-
posed to be observed in advance so rowed from management consulting,
everyone could observe it; giving clearer where it was coined in 1963 at the Stanford
directions of what to do; stating the pur- Research Institute as a way of describing
pose of evaluation; and training the evalu- people who were not directly stockholders
ation observers so that they all recorded in a company but "without whose support
the same thing. the firm would cease to exist" (Mendelow
Further discussion revealed that before 1987:177).
any of these evaluation tasks could be com-
pleted, a prior step would be necessary: Stakeholder management is aimed at proac-
determining who the primary intended us- tive action—action aimed, on the one hand,
ers of the evaluation are. This task consti- at forestalling stakeholder activities that could
tutes the first step in utilization-focused adversely affect the organization and on the
evaluation. Taking this first step is the focus other hand, at enabling the organization to
of this chapter. take advantage of stakeholder opportuni-
ties. . . This can be achieved only through a
conscious decision to adopt the stakeholder
The First Step in Utilization- perspective as part of a strategy formulation
Focused Evaluation process. (Mendelow 1987:177-78)
Many decisions must be made in any Evaluation stakeholders are people

evaluation. The purpose of the evaluation who have a stake—a vested interest—in
must be determined. Concrete evaluative evaluation findings. For any evaluation,
criteria for judging program success will there are multiple possible stakeholders:
usually have to be established. Methods program funders, staff, administrators,
will have to be selected and time lines and clients or program participants. Oth-
agreed on. All of these are important issues ers with a direct, or even indirect, interest
in any evaluation. The question is: Who in program effectiveness may be consid-
BEHOLD the*STAKE-HOLDER"!
ered stakeholders, including journalists ferent things in part because they were
and m e m b e r s of the general public, or interested in different things. They "evalu-
more specifically, taxpayers, in the case of ated" the exercise in different ways, and'
public programs. Stakeholders include many had trouble "evaluating" the exercise
anyone w h o makes decisions or desires at all, in part because they didn't k n o w for
information about a program. H o w e v e r , w h o m they were writing. There were sev-
stakeholders typically have diverse a n d eral potential users of an evaluation of the
often competing interests. N o evaluation "explore the environment" exercise:
can answer all potential questions equally
well. This means that some process is nec- 1. As a workshop leader, I might want to evalu-
essary for n a r r o w i n g the range of possible ate the extent to which the exercise accom-
questions to focus the evaluation. In utili- plished my objectives.
zation-focused evaluation, this process be- 2. Each individual participant might conduct a
gins by narrowing the list of potential personal evaluation according to his or her
stakeholders to a much shorter, more spe- own criteria.
cific group of primary intended users. 3. The group could establish consensus goals
Their information needs, that is, their infor the exercise, which would then serve as
tended uses, focus the evaluation. focus for the evaluation.
The workshop exercise that opened this 4. The bosses, agency directors, and/or funding
chapter illustrates the importance of clearly boards who paid for participants to attend
identifying primary intended users. The might want an assessment of the return on
participants in that exercise observed dif- the resources they have invested for training.
5. The Snow Mountain Ranch director might truism is regularly and consistently ignored
want an evaluation of the appropriateness in the design of evaluation studies. To tar-
of the chapel for such a workshop. get an evaluation at the information needs
6. The building architects might want an of a specific person or a group of identifi-
evaluation of how participants responded able and interacting persons is quite differ-
to the space they designed. ent from what has been traditionally rec-
7. Professional workshop facilitators might ommended as "identifying the audience"
want to evaluate the exercise's effectiveness for an evaluation. Audiences are amor-
for opening a workshop. phous, anonymous entities. Nor is it suffi-
8. Psychologists or human relation trainers cient to identify an agency or organization
might want to assess the effects of the as a recipient of the evaluation report. Or-
exercise on participants. ganizations are an impersonal collection of
9. Experiential learning educators might want hierarchical positions. People, not organi-
an assessment of the exercise as an experi- zations, use evaluation information. I shall
ential learning tool. elaborate these points later in this chapter.
10. The janitors of the chapel might want an First, I want to present data from a study
evaluation of the work engendered for of how federal health evaluations were
them by an exercise that permits moving used. Those findings provide a research
things around (which sometimes occurs foundation for this first step in utilization-
to destructive proportions when I've used focused evaluation. In the course of pre-
the exercise in settings with moveable senting these data, it will also become
furniture). clearer how one identifies primary in-
tended users and why they are the key to
This list of people potentially inter- specifying and achieving intended uses.
ested in the evaluation (stakeholders)
could be expanded. The evaluation ques-
tion in each case would likely be differ- Studying Use: Identification
ent. I would have different evaluation in- of the Personal Factor
formation needs as workshop leader than
would the camp director; the architects' In the mid-1970s, as evaluation was
information needs would differ from the emerging as a distinct field of professional
janitors' "evaluation" questions; the eval- practice, I undertook a study with col-
uation criteria of individual participants leagues and students of 20 federal health
would differ from those reached by the evaluations to assess how their findings had
total group through a consensus-forma- been used and to identify the factors that
tion process. affected varying degrees of use. We inter-
viewed the evaluators and those for whom
the evaluations were conducted. 1 That
Beyond Audience study marked the beginning of the formu-
lation of utilization-focused evaluation
The preceding discourse is not aimed at presented in this book.
simply making the point that different peo- We asked respondents to comment on
ple see things differently and have varying how, if at all, each of 11 factors extracted
interests and needs. I take that to be on the from the literature on utilization had af-
order of a truism. The point is that this fected use of their study. These factors were
methodological quality, methodological matic activity and thereby enhance their

appropriateness, timeliness, lateness of re- own discretion as decision makers, policy-
port, positive or negative findings, surprise makers, consumers, program participants,
of findings, central or peripheral program and funders, or whatever role they play.
objectives evaluated, presence or absence These are the primary users of evaluation.
of related studies, political factors, decision
maker/evaluator interactions, and re-
sources available for the study. Finally, we Data on the Importance
asked respondents to "pick out the single of the Personal Factor
factor you feel had the greatest effect on
how this study was used." The personal factor emerged most dra-
From this long list of questions, only two matically in our interviews when, having
factors emerged as consistently important asked respondents to comment on the im-
in explaining utilization: (a) political con- portance of each of our 11 utilization fac-
siderations, to be discussed in Chapter 14, tors, we asked them to identify the single
and (b) a factor we called the personal factor that was most important in explain-
factor. This latter factor was unexpected, ing the impact or lack of impact of that
and its clear importance to our respondents particular study. Time after time, the factor
had, we believed, substantial implications they identified was not on our list. Rather,
for the use of program evaluation. None of they responded in terms of the importance
the other specific literature factors about of individual people:
which we asked questions emerged as im-
portant with any consistency. Moreover, Item: I would rank as the most important fac-
when these specific factors were important tor this division director's interest, [his] in-
in explaining the use or nonuse of a par- terest in evaluation. Not all managers are that
ticular study, it was virtually always in the motivated toward evaluation. [DM353:17].2
context of a larger set of circumstances and
conditions related to either political con- Item: [The single most important factor that
siderations or the personal factor. had the greatest effect on how the study got
The personal factor is the presence of used was] the principal investigator. . . . If I
an identifiable individual or group of peo- have to pick a single factor, I'll pick people
ple who personally care about the evalu- anytime. [DM328:20]
ation and the findings it generates. Where
such a person or group was present, evalu- Item: That it came from the Office of the
ations were used; where the personal factor Director—that's the most important factor.
was absent, there was a correspondingly . . . The proposal came from the Office of the
marked absence of evaluation impact. Director. It had his attention and he was
The personal factor represents the lead- interested in it, and he implemented many of
ership, interest, enthusiasm, determina- the things. [DM312:21]
tion, commitment, assertiveness, and car-
ing of specific, individual people. These are Item: [The single most important factor was
people who actively seek information to that] the people at the same level of decision
make judgments and reduce decision un- making in [the new office] were not inter-
certainties. They want to increase their ested in making decisions of the kind that the
ability to predict the outcomes of program- people [in the old office] were, I think that
probably had the greatest impact. The fact als. W h e n asked to identify the one factor
that there was no one at [the new office] after that is most i m p o r t a n t in w h e t h e r a study
the transfer who was making programmatic gets used, he summarized his viewpoint as
decisions. [EV361:27] follows:
Item: Well, I think the answer there is in the The most important factor is desire on the
qualities of the people for whom it was part of the managers, both the central federal
made. That's sort of a trite answer, but it's managers and the site managers. I don't think
true. That's the single most important factor there's [any doubt], you know, that evalu-
in any study now that's utilized. [EV232:22] ation should be responsive to their needs,
and if they have a real desire to get on with
Item: Probably the single factor that had the whatever it is they're supposed to do, they'll
greatest effect on how it was used was the apply it. And if the evaluations don't meet
insistence of the person responsible for iitiat- their needs, they won't. About as simple as
ing the study that the Director of you can get. I think the whole process is far
become familiar with its findings and arrive more dependent on the skills of the people
at judgment on it. [DM369:25] who use it than it is on the sort of peripheral
issues of politics, resources. . . . Institutions
Item: [The most important factor was] the are tough as hell to change. You can't change
real involvement of the top decision makers an institution by coming and doing an evalu-
in the conceptualization and design of the ation with a halo. Institutions are changed by
people, in time, with a constant plugging
study, and their commitment to the study.
away at the purpose you want to accomplish.
[DM268:9]
And if you don't watch out, it slides back.
[EV346:15-16]
While these c o m m e n t s concern the im-
portance of interested and committed in-
dividuals in studies that were actually His view had emerged early in the inter-
used, studies that were not used stand out view when he described h o w evaluations
in that there was often a clear absence of were used in the U.S. Office of Economic
the personal factor. O n e evaluator, w h o Opportunity (OEO):
was not sure h o w his study was used, but
suspected it had not been, remarked, In OEO, it depended on who the program
officer was, on the program review officials,
I think that since the client wasn't terribly on program monitors for each of these grant
interested . . . and the whole issue had programs. . . . Where there were aggressive
shifted to other topics, and since we weren't program people, they used these evaluations
interested in doing it from a research point whether they understood them or not. They
of view . . . nobody was interested. used them to effect improvements, direct
[EV264:14] allocations of funds within the program, ex-
plain why the records were kept this way,
Another highly experienced evaluator why the reports weren't complete or what-
was particularly a d a m a n t and articulate ever. Where the program officials were
on the theory that the major factor affect- unaggressive, passive—nothing1.
ing use is the personal energy, interests, Same thing's true at the project level.
abilities, and contacts of specific individu- Where you had a director who was aggres-
46 • T O W A R D M O R E USEFUL EVALUATIONS
sive and understood what the hell the struc- that [evaluation] was built in, but the fact
ture was internally, he used evaluation as that we built it in on purpose. That is, the
leverage to change what went on within his agency head and myself had broad responsi-
program. Those who weren't—nothingl bilities for this, wanted the evaluation study
[EV346:5] results, and we expected to use them. There-
fore, they were used. That's my point. If
At another point he observed, "The basic someone else had built it in because they
thing is h o w the administrators of the pro- thought it was needed, and we didn't care,
gram view themselves and their responsi- I'm sure the use of the study results would
bilities. That's the controlling factor" have been different. [DM367:12]
[EV346:8].
The same theme emerged in his com- T h e evaluator (an external agent se-
ments about each possible factor. Asked lected t h r o u g h an o p e n request-for-
about the effects on use of methodological proposal process) independently c o r r o b o -
quality, positive or negative findings, and rated the decision maker's explanation:
the degree to which the findings were ex-
pected, he always returned eventually to The principal reason [for use] was that the
the importance of managerial interest, decision maker was the guy who requested
competence, and confidence. T h e person the evaluation and used its results. That is,
makes the difference. the organizational distance between the poli-
O u r sample included another rather cymaker and the evaluator was almost zero
adamant articulation of this premise. An in this instance. That's the most important
evaluation of a pilot program involving reason it had an impact. . . . It was the fact
four major projects was undertaken at the that the guy who was asking the question was
instigation of the program administrator. the guy who was going to make use of the
H e made a special effort to make sure that answer. [EV367:12].
his question (i.e., Were the pilot projects
capable of being extended and general- H e r e , then, is a case in which a decision
ized?) was answered. H e guaranteed this by maker commissioned an evaluation k n o w -
personally taking an active interest in all ing w h a t information he n e e d e d ; the
parts of the study. The administrator had evaluator was committed to answering the
been favorable to the program in principle, decision maker's questions; and the deci-
was uncertain what the results would be, sion maker was committed to using the
but was hoping that the program would findings. T h e result was a high level of use
prove effective. The evaluation findings in making a decision contrary to the direc-
were, in fact, negative. The program was tor's initial personal hopes. In the w o r d s
subsequently ended, with the evaluation of the evaluator, the major factor explain-
carrying "considerable weight" in that de- ing use was that "the guy w h o was going
cision [ D M 3 6 7 : 8 ] . Why was this study
to be making the decision was aware of
used in such a dramatic way? His answer
and interested in the findings of the study
was emphatic:
and had some hand in framing the ques-
tions to be answered; that's very impor-
Look, we designed the project with an evalu- t a n t " [EV367:20].
ation component in it, so we were committed The program director's overall conclu-
to use it and we did. . . . It's not just the fact sion gets to the heart of the personal factor:
i
Factors that made a positive contribution to individuals take direct, personal responsi-
use? One would be that the decision makers bility for getting findings to the right peo-
themselves want the evaluation study results. ple, evaluations have an impact. Where the
I've said that several times. If that's not pre- personal factor is absent, there is a marked
sent, it's not surprising that the results aren't absence of impact. Use is not simply deter-
used. [DM367:17] mined by some configuration of abstract
factors; it is determined in large part by
This point was made often in the inter- real, live, caring human beings.
views. O n e highly placed and widely ex-
perienced administrator offered the fol-
lowing advice at the end of a four-hour
interview:
Supporting Research
on the Personal Factor
Win over the program people. Make sure
James Burry (1984) of the UCLA Center
you're hooked into the people who're going
for the Study of Evaluation conducted a
to make the decision in six months from the
thorough review of the voluminous litera-
time you're doing the study, and make sure
ture on evaluation utilization. T h a t review
that they feel it's their study, that these are
was the basis for a synthesis of factors that
their ideas, and that it's focused on their
affect evaluation use (Alkin et al. 1985).
values. [DM283:40]
The synthesis grew out of empirical re-
search on evaluation utilization (Alkin,
Presence of the personal factor in-
Daillak, and White 1979) and organizes the
creases the likelihood of long-term follow-
various factors in three major categories:
through, that is, persistence in getting
human factors, context factors, and evalu-
evaluation findings used. O n e study in
ation factors.
particular stood out in this regard. It was
initiated by a new office director with n o
support internally and considerable o p p o - Human factors reflect evaluator and user
sition from other affected agencies. T h e characteristics with a strong influence on
director found an interested and commit- use. Included here are such factors as peo-
ted evaluator. T h e t w o w o r k e d closely ple's attitudes toward and interest in the
together. T h e findings were initially ig- program and its evaluation, their back-
nored because it wasn't a hot political grounds and organizational positions, and
issue at the time, but over the ensuing four their professional experience levels.
years, the director and evaluator person- Context factors consist of the require-
ally worked to get the attention of key ments and fiscal constraints facing the evalu-
members of Congress. T h e evaluation ation, and relationships between the pro-
eventually contributed to passing signifi- gram being evaluated and other segments of
cant legislation in a new area of federal its broader organization and the surrounding
programming. F r o m beginning to end, the community.
story was one of personal h u m a n commit- Evaluation factors refer to the actual con-
ment to getting evaluation results used. duct of the evaluation, the procedures used
Although the specifics vary from case to in the conduct of the evaluation, and the
case, the pattern is markedly clear: Where quality of the information it provides. (Burry
the personal factor emerges, where some 1984:1)
The primary weakness of this frame- Evaluation devoted to "Stakeholder-

work is that the factors are undifferenti- Based Evaluation" (Bryk 1983), "The Cli-
ated in terms of importance. Burry ended ent Perspective in Evaluation" (Nowak-
up with a checklist of factors that may owski 1987), and "Evaluation Utilization"
influence evaluation, but no overall hier- (McLaughlin et al. 1988). Marvin Alkin
archy was presented in his synthesis; that (1985), founder and former director of
is, a hierarchy that places more impor- the Center for the Study of Evaluation at
tance on certain factors as necessary the University of California, Los Angeles,
and/or sufficient conditions for evaluation made the personal factor the basis for his
use. At a 1985 conference on evaluation Guide for Evaluation Decision-Makers.
use sponsored by the UCLA Center for the Jean King concluded from her research
Study of Evaluation, I asked Jim Burry if review (1988) and case studies (1995) that
his extensive review of the literature sug- involving the right people is critical to
gested any factors as particularly impor- evaluation use. In a major analysis of "the
tant in explaining use. He answered with- Feasibility and Likely Usefulness of Evalu-
out hesitation: ation," Joseph Wholey (1994) has shown
that involving intended users early is criti-
There's no question about it. The personal cal so that "the intended users of the
factor is far and away the most important. evaluation results have agreed on how
You're absolutely right in saying that the they will use the information" (p. 16) be-
personal factor is the most important ex- fore the evaluation is conducted. Cousins,
planatory variable in evaluation use. The Donohue, and Bloom (1995) reviewed a
research of the last five years confirms the great volume of research on evaluation
primacy of the personal factor, (personal and found that "a growing body of data
conversation 1985) provide support" for the proposition that
"increased participation in research by
Lester and Wilds (1990) conducted a stakeholders will heighten the probabil-
comprehensive review of the literature on ity that research data will have the in-
use of public policy analysis. Based on that tended impact" (p. 5). Johnson (1995)
review, they developed a conceptual used conjoint measurement and analysis
framework to predict use. Among the hy- to estimate evaluation use and found that
potheses they found supported were evaluators attribute increased use to in-
these: creased participation in the evaluation
process by practitioners. And Carol Weiss
• The greater the interest in the subject by the (1990), one of the leading scholars of
decision maker, the greater the likelihood of knowledge use, concluded in her key-
utilization. note address to the American Evaluation
• The greater the decision maker's participa- Association:
tion in the subject and scope of the policy
analysis, the greater the likelihood of utiliza-
tion, (p. 317) First of all, it seems that there are certain
participants in policy making who tend to be
These hypotheses were further con- 'users' of evaluation. The personal factor—a
firmed in the evaluation literature in spe- person's interest, commitment, enthusi-
cial issues of New Directions for Program asm—plays a part in determining how much
influence a piece of research will have, The GAO (1995) report recommended
(p. 177) that Senate Committee members have "in-
creased communication with agency pro-
The need for interactive dialogue at a gram and evaluation staff to help ensure
personal level applies to large-scale na- that information needs are understood
tional evaluations as well as smaller-scale, and that requests and reports are suitably
local evaluations (Dickey 1981). Wargo framed and are adapted as needs evolve"
(1995) analyzed three unusually success- (p. 41). This recommendation affirms the
ful federal evaluations in a search for importance of personal interactions -as a
"characteristics of successful program basis for mutual understanding to increase
evaluations"; he found that active in- the relevance and, thereby, the utility of
volvement of key stakeholders was critical evaluation reports.
at every stage: during planning, while Another framework that supports the
conducting the evaluation, and in dissemi- importance of the personal factor is the
nation of findings (p. 77). In 1995, the "Decision-Oriented Educational Research"
U.S. General Accounting Office (GAO) approach of Cooley and Bickel (1985). Al-
studied the flow of evaluative information • though the label for this approach implies
to Congress by following up three major a focus on decisions rather than people, in
federal programs: the Comprehensive Child fact the approach is built on a strong "client
Development Program, the Community orientation." This client orientation means
Health Centers program, and the Chapter that the primary intended users of decision-
1 Elementary and Secondary Education oriented educational research are clearly
Act, aimed at providing compensatory identified and then involved in all stages of
education services to low-income stu- the work through ongoing dialogue be-
dents. Analysts concluded that underutili- tween the researcher and the client. Cooley
zation of evaluative information was a and Bickel presented case evidence to
direct function of poor communications document the importance of being client-
between intended users (members of the oriented.
Senate Committee on Labor and Human In a major review of evaluation use in
Resources) and responsible staff in the nonprofit organizations, the Independent
three programs: Sector concluded that attending to "the
human side of evaluation" makes all the
difference. "Independent Sector learned
Finally, we observed that communication be- that evaluation means task, process, and
tween the Committee and agency staff people. It is the people side—the human
knowledgeable about program information resources of the organization—who make
was limited and comprised a series of one- the "formal" task and process work and
way communications (from the Committee will make the results work as well" (Moe
to the agency or the reverse) rather than joint 1993:19).
discussion. This pattern of communication, The evaluation literature contains sub-
which was reinforced by departmental ar- stantial additional evidence that working
rangements for congressional liaison, affords with intended users can increase use (e.g.,
little opportunity to build a shared under- Bedell et al. 1985; Dawson and D'Amico
standing about the Committee's needs and 1985; King 1985; Lawler et al. 1985;
how to meet them. (GAO 1995:40) Siegel and Tuckel 1985; Cole 1984; Evans
and Blunden 1984; Hevey 1984; Rafter tors' beliefs and practices conducted by
1984; Bryk 1983; Campbell 1983; Glaser, Cousins et al. (1995).
Abelson, and Garrison 1983; Lewy and Cousins and his colleagues (1995) sur-
Alkin 1983; Stalford 1983; Barkdoll 1982; veyed a sample of 564 evaluators and 68
Beyer and Trice 1982; Canadian Evalu- practitioners drawn from the membership
ation Society 1982; King and Pechman lists of professional evaluation associations
1982; Saxe and Koretz 1982; Dickey and in the United States and Canada. The sur-
Hampton 1981; Leviton and Hughes vey included a list of possible beliefs that
1981; Alkin and Law 1980; Braskamp and respondents could agree or disagree with.
Brown 1980; Studer 1978). Greatest consensus centered on the state-
Support for the importance of the per- ment "Evaluators should formulate rec-
sonal factor also emerged from the work of ommendations from the study." (I'll dis-
the Stanford Evaluation Consortium, one cuss recommendations in a later chapter.)
of the leading places of ferment and reform The item eliciting the next highest agree-
in evaluation during the late 1970s and ment was "The evaluator's primary func-
early 1980s. Cronbach and associates in the tion is to maximize intended uses by in-
Consortium identified major reforms tended users of evaluation data" (p. 19).
needed in evaluation by publishing a pro- Given widespread agreement about the de-
vocative set of 95 theses, following the sired outcome of evaluation, namely, in-
precedent of Martin Luther. Among their tended uses by intended users, let's now
theses was this observation on the personal examine some of the practical implications
factor: "Nothing makes a larger difference of this perspective.
in the use of evaluations than the personal
factor—the interest of officials in learning
from the evaluation and the desire of the Practical Implications
evaluator to get attention for what he of the Personal Factor
knows" (Cronbach et al. 1980:6; emphasis
added). First, in order to work with primary
intended users to achieve intended uses, the
evaluation process must surface people
Evaluation's Premier Lesson who want to know something. This means
locating people who are able and willing to
The importance of the personal factor in use information. The number may vary
explaining and predicting evaluation use from one prime user to a fairly large group
leads directly to the emphasis in utilization- representing several constituencies, for ex-
focused evaluation on working with in- ample, a task force of program staff, clients,
tended users to specify intended uses. The funders, administrators, board members,
personal factor directs us to attend to spe- community representatives, and officials or
cific people who understand, value, and policymakers (see Exhibit 3.1). Cousins
care about evaluation and further directs us et al. (1995) surveyed evaluators and found
to attend to their interests. This is the pri- that they reported six stakeholders as the
mary lesson the profession has learned median number typically involved in a pro-
about enhancing use, and it is wisdom now ject. While stakeholders' points of view
widely acknowledged by practicing evalua- may vary on any number of issues, what
tors, as evidenced by research on evalua- they should share is a genuine interest in
EXHIBIT 3.1
A Statewide Evaluation Task Force
The Personal Factor means getting key influential together, face-to-face, to negotiate the design.
Here's an example.
In 1993, the Minnesota Department of Transportation created eight Area Transportation
Partnerships to make decisions about roads and other transportation investments in a cooperative
fashion between state and local interests. To design and oversee the study of how the partnerships
were working, a technical panel was created to represent the diverse interests involved. Members of
the technical panel included:
The District Engineer from District 1 (Northeast)

The Planning Director from District 6 (Southeast) ,
The District Planner from District 7 (South central)
Planner for a Regional Development Council (Northwest)
Department of Transportation Director of Economic Analysis and Special Studies, State Office of
Investment Management
An influential county commissioner
Director of a regional transit operation
Director of a regional metropolitan Council of Governments (Western part of the state)
Member of the Metropolitan Council Transportation Advisory Committee (Greater Minneapolis/
Saint Paul)
A county engineer
A private transportation consultant
A city engineer from a small town
A metropolitan planning and research engineer
The State Department of Transportation Interagency Liaison
A University of Minnesota researcher from the University's Center for Transportation Studies
An independent evaluation consultant (not the project evaluator)
Five senior officials from various offices of the State Department of Transportation
The evaluator and two assistants
This group met quarterly throughout the study. The group made substantive improvements in the
original design, gave the study credibility with different stakeholder groups, participated in interpreting
findings, and laid the groundwork for use.
using evaluation, an interest manifest in a ation is to answer seriously and searchingly

willingness to take the time and effort to the question posed by Marvin Alkin
work through their information needs and (1975a): "Evaluation: Who Needs It? Who
interests. Thus, the first challenge in evalu- Cares?" Answering this question, as we
shall see, is not always easy, but it is always chapter discussed ways of cultivating inter-
critical. est in evaluation and building commitment
Second, formal position and authority to use. Even people initially inclined to
are only partial guides in identifying pri- value evaluation will still often need train-
mary users. Evaluators must find strategi- ing and support to become effective infor-
cally located people who are enthusiastic, mation users.
committed, competent, interested, and as- Fifth, evaluators need skills in building
sertive. Our data suggest that more may be relationships, facilitating groups, managing
accomplished by working with a lower- conflict, walking political tightropes, and
level person displaying these characteristics effective interpersonal communications to
than by working with a passive, uninter- capitalize on the importance of the per-
ested person in a higher position. sonal factor. Technical skills and social sci-
Third, quantity, quality, and timing of ence knowledge aren't sufficient to get
interactions with intended users are all im- evaluations used. People skills are critical.
portant. A large amount of interaction be- Ideals of rational decision making in mod-
tween evaluators and users'with little sub- ern organizations notwithstanding, per-
stance may backfire and actually reduce sonal and political dynamics affect what
stakeholder interest. Evaluators must be really happens. Evaluators without the
strategic and sensitive in asking for time savvy and skills to deal with people and
and involvement from busy people, and politics will find their work largely ignored,
they must be sure they're interacting with or, worse yet, used inappropriately.
the right people around relevant issues. Sixth, a particular evaluation may have
Increased contact by itself is likely to ac- multiple levels of stakeholders and there-
complish little. Nor will interaction with fore need multiple levels of stakeholder
the wrong people (i.e., those who are not involvement. For example, funders, chief
oriented toward use) help much. It is the executives, and senior officials may consti-
nature and quality of interactions between tute the primary users for overall effective-
evaluators and decision makers that is at ness results, while lower-level staff and
issue. My own experience suggests that participant stakeholder groups may be in-
where the right people are involved, the volved in using implementation and moni-
amount of direct contact can sometimes be toring data for program improvement. Ex-
reduced because the interactions that do hibit 3.2 provides an example of such a
occur are of such high quality. Later, when multiple level structure for different levels
we review the decisions that must be made of stakeholder involvement and evaluation
in the evaluation process, we'll return to use.
the issues of quantity, quality, and timing of Menu 3.1 summarizes these practical
interactions with intended users. implications of the personal factor for use.
Fourth, evaluators will typically have to
work to build and sustain interest in evalu-
ation use. Identifying intended users is part Diversions Away
selection and part nurturance. Potential us- From Intended Users
ers with low opinions of or little interest in
evaluation may have had bad prior experi- To appreciate some of the subtleties of
ences or just not have given much thought the admonition to focus on intended use by
to the benefits of evaluation. The second intended users, let's consider a few of the
EXHIBIT 3.2
A Multilevel Stakeholder Structure and Process
The Saint Paul Foundation formed a Donor Review Board of several philanthropic foundations in
Minnesota to fund a project, Supporting Diversity in Schools (SDS). The project established local
school-community partnerships with communities of color: African Americans, Hispanics, Native
Americans, and Southeast Asians. The evaluation had several layers based on different levels of
stakeholder involvement and responsibility.
Stakeholder Group Evaluation Focus Nature of Involvement
Donor Review Board (Executives Overall effectiveness; policy Twice-a-year meetings to review
and Program Officers from implications; sustainability the design and interim evaluation
contributing Foundations and results
School Superintendent)
Final report directed to this group
District Level Evaluation Group Implementation monitoring An initial full-day retreat with
(Representatives from in early years; district-level 40 people from diverse groups;
participating schools, social outcomes in later years annual retreat sessions to update,
service agencies, community refocus, and interpret interim
organizations, and project staff) findings
Partnership Level Evaluation Documenting activities and Annual evaluation plan; complet-
Teams (Teachers, community outcomes at the local partnering evaluation documents for
representatives, and evaluation ship level: one school, one every activity; quarterly review
staff liaisons) community of color of progress to use findings for
improvement
temptations evaluators face that lure them own questions according to their own in-
away from the practice of utilization- terests, needs, and priorities. Others may
focused evaluation. have occasional input here and there, but
First, and most common, evaluators are what emerges is an evaluation by the
tempted to make themselves the major de- evaluators, for the evaluators, and of the
cision makers for the evaluation. This can evaluators. Such studies are seldom of use
happen by default (no one else is willing to to other stakeholders, whose reactions are
do it), by intimidation (clearly, the evalu- likely to be, "Great study. Really well done.
ator is the expert), or simply by failing to Shows lots of work, but doesn't tell us
think about or seek primary users (why anything we want to know."
make life difficult?). The tip-off that A less innocent version of this scenario
evaluators have become the primary in- occurs when academics pursue their basic
tended users (either by intention or default) research agendas under the guise of evalu-
is that the evaluators are answering their ation research. The tip-off here is that the
I MUNI 13.1
Implications of the Personal Factor for Planning Use
• Find and cultivate people who want to learn.

• Formal position and authority are only partial guides in identifying primary
users. Find strategically located people who are enthusiastic, committed,
competent, and interested.
• Quantity, quality, and timing of interactions with intended users are all
important.
• Evaluators will typically have to work to build and sustain interest in
evaluation use. Building effective relationships with intended users is part
selection, part nurturance, and part training.
• Evaluators need people skills in how to build relationships, facilitate groups,
manage conflict, walk political tightropes, and communicate effectively.
• A particular evaluation may have multiple levels of stakeholders and there-
fore need multiple levels of stakeholder involvement. (See Exhibit 3.2.)
v . L: )
evaluators insist on designing the study in Lincoln (1981). Responsive evaluation

such a way as to test some theory they think "takes as its organizer the concerns and
is particularly important, whether or not issues of stakeholding audiences" (Guba
people involved in the program see any and Lincoln 1981:23; emphasis in the
relevance to such a test. original). The evaluator interviews and ob-
A second temptation that diverts evalua- serves stakeholders, then designs an evalu-
tors from focusing on specific intended ation that is responsive to stakeholders'
users is to fall prey to the seemingly stake- issues. The stakeholders, however, are no
holder-oriented "identification of audi- more than sources of data and an audience
ence" approach. Audiences turn out to be for the evaluation, not real partners in the
relatively passive groups of largely anony- evaluation process.
mous faces: the "feds," state officials, the The 1994 revision of the Joint Commit-
legislature, funders, clients, the program tee Standards for Evaluation moved to lan-
staff, the public, and so forth. If specific guage about "intended users" and "stake-
individuals are not identified" from these holders" in place of earlier references to
audiences and organized in a manner that "audiences." Thus, in the new version, "the
permits meaningful involvement in the Utility Standards are intended to ensure
evaluation process, then, by default, the that an evaluation will serve the informa-
evaluator becomes the real decision maker tion needs of intended users" as opposed
and stakeholder ownership suffers, with a to "given audiences" in the original 1981
corresponding threat to utility. This is my version (Joint Committee 1994,1981; em-
critique of "responsive evaluation" as ad- phasis added). The first standard was
vocated by Stake (1975) and Guba and changed to "Stakeholder Identification"
rather than the original "Audience ldenLi tial from the outset. To target evaluations
tification." Such changes in language are at organizations is to target them at nobody
far from trivial. They indicate how the in particular—and, in effect, not to really
knowledge base of the profession has target them at all.
evolved. The language we use shapes how A fourth diversion away from intended
we think. The nuances and connotations users is to focus on decisions instead of on
reflected in these language changes are fun- decision makers. This approach is epito-
damental to the philosophy of utilization- mized by Mark Thompson (1975), who
focused evaluation. defined evaluation as "marshalling of infor-
A third diversion from intended users mation for the purposes of improving de-
occurs when evaluators target organiza- cisions" (p. 26) and made the first step in
tions rather than specific individuals. This an evaluation "identification of the deci-
appears to be more specific than targeting sion or decisions for which information is
general audiences, but really isn't. Organi- required" (p. 38). The question of who will
zations as targets can be strangely devoid of make the decision remains implicit. The
real people. Instead, the focus shifts to decision-oriented approach stems from a
positions and the roles and authority that rational social scientific model of how de-
attach to positions. Since Max Weber's cision making occurs:
(1947) seminal essay on bureaucracy gave
i
birth to the study of organizations, sociolo- 1. A clear-cut decision is expected to be made.
gists have viewed the interchangeability of 2. Information will inform the decision.
people in organizations as the hallmark of 3. A study supplies the needed information.
institutional rationality in modern society. 4. The decision is then made in accordance
Under ideal norms of bureaucratic ration- with the study's findings.
ality, it doesn't matter who's in a position,
only that the position be filled using uni- The focus in this sequence is on data
versalistic criteria. Weber argued that bu- and decisions rather than people. But peo-
reaucracy makes for maximum efficiency ple make decisions and, it turns out, most
precisely because the organization of role- "decisions" accrete gradually and incre-
specific positions in an unambiguous hier- mentally over time rather than getting
archy of authority and status renders action made at some concrete, decisive moment
calculable and rational without regard to (Weiss 1990, 1977; Allison 1971; Lind-
personal considerations or particularistic blom 1965, 1959). It can be helpful, even
criteria. Such a view ignores the personal crucial, to orient evaluations toward fu-
factor. Yet, it is just such a view of the world ture decisions, but identification of such
that has permeated the minds of evaluators decisions, and the implications of those
when they say that their evaluation is for decisions for the evaluation, are best made
the federal government, the state, the in conjunction with intended users who
agency, or any other organizational entity. come together to decide what data will be
But organizations do not consume informa- needed for what purposes, including, but
tion; people do—individual, idiosyncratic, not limited to, decisions.
caring, uncertain, searching people. Who is Utilization-focused evaluation is often
in a position makes all the difference in the confused with or associated with decision-
world to evaluation use. To ignore the per- oriented approaches to evaluation, in part,
sonal factor is to diminish utilization poten- I presume, because both approaches are
concrete and focused and both are consid- ness and efficiency; (2) the behavioral ob-
ered "utilitarian." Ernest House (1980) jectives approach, which measures attain-
wrote an important book categorizing vari- ment of clear, specific goals; (3) goal-free
ous approaches to evaluation in which he evaluation, which examines the extent to
included utilization-focused evaluation which actual client needs are being met
among the "decision-making models" he by the program; (4) the art criticism ap-
reviewed. The primary characteristic of a proach, which makes the evaluator's own
decision-making model is that "the evalu- expertise-derived standards of excellence a
ation be structured by the actual decisions criterion against which programs are
to be made" (p. 28). I believe he incorrectly judged; (5) the accreditation model, where
categorized utilization-focused evaluation a team of external accreditors determines
because he failed to appreciate the distinct the extent to which a program meets pro-
and critical nature of the personal factor. fessional standards for a given type of pro-
While utilization-focused evaluation in- gram; (6) the adversary approach, in which
cludes the option of focusing on decisions, two teams do battle over the summative
it can also serve a variety of other purposes, question of whether a program should be
depending on the information needs of continued; and (7) the transaction model,
primary intended users. That is, possible which concentrates on program processes.
intended uses include a large menu of op- What is omitted from the House classi-
tions, which we'll examine in Chapters 4 fication scheme is an approach to evalu-
and 5. For example, the evaluation process ation that focuses on and is driven by the
can be important in directing and focusing information needs of specific people who
how people think about the basic policies will use the evaluation processes and find-
involved in a program, what has come to ings. The point is that the evaluation is
be called conceptual use; evaluations can user-focused. Utilization-focused evalua-
help in fine-tuning program implementation, then, in my judgment, falls within a
tion; the process of designing an evaluation category of evaluations that I would call,
may lead to clearer, more specific, and following Marvin Alkin (1995), user-ori- -
more meaningful program goals; and ented. This is a distinct alternative to the
evaluations can provide information on cli- other models identified by House. In the
ent needs and assets that will help inform other models, the content of the evaluation
general public discussions about public is determined by the evaluator's presuppo-
policy. These and other outcomes of evalu- sitions about what constitutes an evalu-
ation are entirely compatible with utiliza- ation: a look at the relationship between
tion-focused evaluation but do not make a inputs and outcomes; the measurement of
formal decision the driving force behind goal attainment; advice about a specific
the evaluation. programmatic decision; description of pro-
Nor does utilization-focused evaluation gram processes; a decision about future or
really fit within any of House's other seven continued funding; or judgment according
categories, though any of them could be an to some set of expert or professional stan-
option in a utilization-focused evaluation if dards. In contrast to these models, user-
that's the way intended users decided to focused evaluation describes an evaluation
orient the evaluation: (1) systems analysis, process for making decisions about the con-
which quantitatively measures program intent of an evaluation—but the content itself
puts and outcomes to look at effective- is not specified or implied in advance.
Thus, any of the eight House models, or in any evaluation, but relying on the hope
adaptations and combinations of those that something useful will turn up is a risky
models, might emerge as the guiding direc- strategy. Eleanor Chelimsky (1983) has ar-
tion in user-focused evaluation, depending gued that the most important kind of ac-
on the information needs of the people for countability in evaluation is use that comes
whom the evaluation information was be- from "designed tracking and follow-up of
ing collected. Let's continue, now, examin- a predetermined use to predetermined
ing three other temptations that divert user" (p. 160). She calls this a "closed-
evaluators from being user focused. looped feedback process" in which "the
A fifth temptation is to assume that the policymaker wants information, asks for it,
flinders of the evaluation are the primary and is interested in and informed by the
intended users, that is, those who pay the response" (p. 160). This perspective solves
fiddler call the tune. In some cases, this is the problem of defining use, addresses the
accurate. Funders are hopefully among question of whom the evaluation is for, and
those most interested in using evaluation. builds in evaluation accountability since the
But there may be additional important us- predetermined use becomes the criterion
ers. Moreover, evaluations are funded for against which the success of the evaluation
reasons other than their perceived utility, can be judged. SQch a process has to be
for example, wanting to give the appear- planned.
ance of supporting evaluation; because leg- A seventh and final temptation (seven
islation or licensing requires evaluation; or use-deadly sins seem sufficient, though cer-
because someone thought it had to be writ- tainly not exhaustive of the possibilities) is
ten into the budget. Those who control to convince oneself that it is unseemly to
evaluation purse strings may not have any enter the fray and thereby run the risks that
specific evaluation questions. Often, they come with being engaged. I've heard aca-
simply believe that evaluation is a good demic evaluators insist that their responsi-
thing that keeps people on their toes. They bility is to ensure data quality and design
may not care about the content of a spe- rigor in the belief that the scientific validity
cific evaluation; they may care only that of the findings will carry the day. The evi-
evaluation—any evaluation—takes place. dence suggests this seldom happens. An
They mandate the process, but not the academic stance that justifies the evaluator
substance. Under such conditions (which standing above the messy fray of people
are not unusual), there is considerable op- and politics is more likely to yield scholarly
portunity for identifying and working with publications than improvements in pro-
additional interested stakeholders to for- grams. Fostering use requires becoming en-
mulate relevant evaluation questions and a gaged in building relationships and sorting
correspondingly appropriate design. through the politics that enmesh any pro-
A sixth temptation is to put off attending gram. In so doing, the evaluator runs the
to and planning for use from the beginning. risks of getting entangled in changing
It's tempting to wait until findings are in to power dynamics, having the rug pulled out
worry about use, essentially not planning by the departure of a key intended user,
for use by waiting to see what happens. In having relationships go bad, and/or being
contrast, planned use occurs when the in- accused of bias. Later we'll discuss strate-
tended use by intended users is identified gies for dealing with these and other risks,
at the beginning. Unplanned use can occur but the only way I know to avoid them
58 TOWARD MORE USEFUL EVALUATIONS
MENU 3.2
Temptations Away From Being User-Focused: Seven Use-Deadly Sins
1. Evaluators make themselves the primary decision makers and, therefore, the
primary users.
2. Identifying vague, passive audiences as users instead of real people.
3. Targeting organizations as users (e.g., "the feds") instead of specific persons.
4. Focusing on decisions instead of decision makers.
5. Assuming the evaluation's funder is automatically the primary stakeholder.
6. Waiting until the findings are in to identify intended users and intended uses.
7. Taking a stance of standing above the fray of people and politics.
altogether is to stand aloof; that may pro- understands the problem better, understands
vide safety, but at the high cost of utility and the choices better, or understands the impli-
relevance. cations of choice better. The decision maker
Menu 3.2 summarizes these seven use- can say that this analysis helped me. (Lynn
deadly temptations that divert evaluators 1980a:85)
from clearly specifying and working with
intended users. Notice here that the emphasis is on
informing the decision maker, not the de-
cision. Lynn argues in his casebook on
User-Focused Evaluation policy analysis (Lynn 1980b) that a major
in Practice craft skill needed by policy and evaluation
analysts is the ability to understand and
Lawrence Lynn Jr., Professor of Public make accommodations for a specific deci-
Policy at the Kennedy School of Govern- sion maker's cognitive style and other per-
sonal characteristics. His examples are ex-
ment, Harvard University, has provided ex-
emplars of the user-focused approach.
cellent evidence for the importance of a
user-focused way of thinking in policy
analysis and evaluation. Lynn was inter- Let me take the example of Elliot Rich-
viewed by Michael Kirst for Educational ardson, for whom I worked, or Robert
Evaluation and Policy Analysis. He was MacNamara, for that matter. These two in-
asked, "What would be a test of a 'good dividuals were perfectly capable of under-
policy analysis'?" standing the most complex issues and ab-
sorbing details—absorbing the complexity,
One of the conditions of a good policy analy- fully considering it in their own minds. Their
sis is that it is helpful to a decision maker. A intellects were not limited in terms of what
decision maker looks at it and finds he or she they could handle. . . . On the other hand,
k
you will probably find more typical the deci- had been trained in the Jesuitical style of
sion makers who do not really like to argument. T h e challenge for a policy ana-
approach problems intellectually. They may lyst or evaluator, then, becomes grasping
be visceral, they may approach issues with a the decision maker's cognitive style and
wide variety of preconceptions, they may not logic. President Ronald Reagan, for ex-
like to read, they may not like data, they may ample, liked Reader's Digest style stories
not like the appearance of rationality, they and anecdotes. F r o m Lynn's perspective,
may like to see things couched in more po- an analyst presenting to Reagan w o u l d
litical terms, or overt value terms. And an have to figure out h o w to communicate
analyst has got to take that into account. policy issues t h r o u g h stories. H e a d m o n -
There is no point in presenting some highly ished analysts and evaluators to "discover
rational, comprehensive piece of work to a those art forms by which one can present
Secretary or an Assistant Secretary of State the result of one's intellectual effort" in a
who simply cannot or will not think that way. way that can be heard, appreciated a n d
But that does not mean the analyst has no understood: 1
role; that means the analyst has to figure out
how he can usefully educate someone whose In my judgment, it is not as hard as it sounds.
method of being educated is quite different. I think it is not that difficult to discover how
We did a lengthy case on the Carter ad- a Jerry Brown or a Joe Califano or a George
ministration's handling of the welfare re- Bush or a Ted Kennedy thinks, how he reacts.
form issue, and, in particular, the role of Joe All you have got to do is talk to people who
Califano and his analysts. Califano was very deal with them continuously, or read what
different in the way he could be reached than they say and write. And you start to discover
an Elliot Richardson, or even Casper Wein- the kinds of things that preoccupy them, the
berger. Califano is a political animal and has kinds of ways they approach problems. And
a relatively short attention span—highly in- you use that information in your policy
telligent, but an action-oriented person. And analyses. I think the hang-up most analysts
one of the problems his analysts had is that or many analysts have is that they want to be
they attempted to educate him in the classi- faithful to their discipline. They want to be
cal, rational way without reference to any faithful to economics or faithful to political
political priorities, or without attempting to science and are uncomfortable straying be-
couch issues and alternatives in terms that yond what their discipline tells them they are
would appeal to a political, action-oriented competent at dealing with. The analyst is
individual. And so there was a terrible com- tempted to stay in that framework with
munications problem between the analysts which he or she feels most comfortable.
and Califano. I think a large part of that had And so they have the hang-up, they can-
nothing to do with Califano's intellect or his not get out of it. They are prone to say that
interest in the issues; it had a great deal to do my tools, my training do not prepare me to
with the fact that his cognitive style and the deal with things that are on Jerry Brown's
analyst's approach just did not match. mind, therefore, I cannot help him. That is
wrong. They can help, but they have got to
Lynn also used the example of Jerry be willing to use the information they have
Brown, former G o v e r n o r of California. about how these individuals think and then
Brown liked policy analyses framed as begin to craft their work, to take that
a debate—thesis, antithesis—because he into account. (Lynn 1980a: 86-87).
60 • TOWARD MORE USEFUL EVALUATIONS r
Lynn's examples d o c u m e n t the impor- through 10), methods decisions (Chapters

tance of the personal factor at the highest 11 and 12); and analysis approaches (Chap-
levels of government. Alkin et al. (1979) ter 13). We'll also look at the political and
have shown h o w the personal factor o p - ethical implications of utilization-focused
erates in evaluation use at state and local evaluation (Chapter 14).
levels. Focusing on the personal factor Throughout, we'll be guided by atten-
provides direction about w h a t to look for tion to the essence of utilization-focused
and h o w to proceed in planning for use. evaluation: focusing on intended use for
specific intended users. Focus and specific-
ity are ways of coming to grips with
B e y o n d Just Beginning the fact that n o evaluation can serve all
potential stakeholders' interests equally
In this chapter, we've discussed the per- well. Utilization-focused evaluation makes
sonal factor as a critical consideration in explicit whose interests are served. For, as
enhancing evaluation use. The importance Baltasar Gracian observed in 1647 in The
of the personal factor explains why utiliza- Art of Worldly Wisdom:
tion-focused evaluators begin by identify-
ing and organizing primary intended evalu- It is a great misfortune to be of use to no-
ation users. They then interact with these body; scarcely less to be of use to everybody.
primary users throughout the evaluation to
nurture and sustain the commitment to use.
For there is an eighth deadly-use sin: iden- Notes
tifying primary intended users at the outset
of the study, then ignoring them until the 1. At the time of the study in 1976, I was
final report is ready. Director of the Evaluation Methodology Pro-
Attending to primary intended users is gram in the Humphrey Institute of Public Af-
not just an academic exercise performed fairs, University of Minnesota. The study was
for its own sake. Involving specific people conducted through the Minnesota Center for
w h o can and will use information enables Social Research, University of Minnesota. Re-
them to establish direction for, commit- sults of the study were first published under the
ment to, and ownership of the evaluation title "In Search of Impact: An Analysis of the
every step along the way, from initiation of Utilization of Federal Health Evaluation Re-
the study through the design and data col- search" (Patton, Grimes, et al. 1977). For details
lection stages right through to the final on the study's design and methods, see Patton
report and dissemination process. If deci- 1986:30-39. The 20 cases in the study included
sion makers have shown little interest in the 4 mental health evaluations, 4 health training
study in its earlier stages, our data suggest programs, 2 national assessments of laboratory
that they are not likely to show a sudden proficiency, 2 evaluations of neighborhood
interest in using the findings at the end. health center programs, studies of 2 health ser-
They w o n ' t be sufficiently prepared for use. vices delivery systems programs, a training
The remainder of this book examines program on alcoholism, a health regulatory pro-
the implications of focusing on intended gram, a federal loan-forgiveness program, a
use by intended users. We'll look at the training workshop evaluation, and 2 evaluations
implications for h o w an evaluation is con- of specialized health facilities. The types of
ceptualized and designed (Chapters 4 evaluations ranged from a three-week program
review carried out by a single internal evaluator eral administrators or researchers. In one case,
to a four-year evaluation that cost $1.5 million. the evaluation was contracted from one unit of
Six of the cases were internal evaluations and 14 the federal government to another, so the
were external. evaluators were also federal researchers. The
Because of very limited resources, it was remaining 13 evaluations were conducted by
possible to select only three key informants to private organizations or nongovernment em-
be contacted and intensively interviewed about ployees, although several persons in this group
the utilization of each of the 20 cases in the final either had formerly worked for the federal
sample. These key informants were (a) the gov- government or have since come to do so.
ernment's internal project officer (PO) for the Evaluators in our sample represented over 225
study, (b) the person identified by the project years of experience in conducting evaluative
officer as being either the decision maker for the research.
program evaluated or the person most knowl- 2. Citations for quotes taken from the inter-
edgeable about the study's impact, and (c) the view transcripts use the following format:
evaluator who had major responsibility for the [DM367:13] refers to the transcript of an inter-
study. Most of the federal decision makers interview with a decision maker about evaluation
viewed had been or now are office directors study number 367; this quote was taken from
(and deputy directors), division heads, or bu- page 13 of the transcript. The study numbers
reau chiefs. Overall, these decision makers rep- and page numbers have been systematically al-
resented over 250 years of experience in the tered to protect the confidentiality of the inter-
federal government. viewees. EV201:10 and P0201:6 refer to
The evaluators in our sample were a rather interviews about the same study, the former
heterogeneous group. Six of the 20 cases were being an interview with the evaluator, the latter
internal evaluations, so the evaluators were fed- an interview with the project officer.
Intended Uses of Findings
/ fyou don't know where you're going, you'll end up somewhere else.
—Yogi Berra
• When Alice encounters the Cheshire Cat in Wonderland, she asks, "Would you tell
me, please, which way I ought to walk from here?"
• "That depends a good deal on where you want to get to," said the Cat.
"I don't much care where—" said Alice.
"Then it doesn't matter which way you walk," said the Cat.
"—so long as I get somewhere," Alice added as an explanation.
"Oh, you're sure to do that," said the Cat, "if you only walk long enough."
—Lewis .Garroii
This story carries a classic evaluation intended users. This chapter will offer a
message: To evaluate how well you're do- menu of intended uses.
ing, you must have some place you're try-
ing to get to. For programs, this has meant
having goals and evaluating goal attain- Identifying Intended Uses
ment. For evaluators, this means clarifying From the Beginning
the intended uses of a particular evaluation.
In utilization-focused evaluation, the The last chapter described a follow-up
primary criterion by which an evaluation is study of 20 federal health evaluations that
judged is intended use by intended users. assessed use and identified factors related
The previous chapter discussed identifying to varying degrees of use. A major finding
63
from that study was that none of our inter- Intended uses vary from evaluation to
viewees had carefully considered intended evaluation. There can be no generic or
use prior to getting the evaluation's find- absolute definition of evaluation use be-
ings. We found that decision makers, pro- cause "use" depends in part on the values
gram officers, and evaluators typically de- and goals of primary users. As Eleanor
voted little or no attention to intended uses Chelimsky (1983) has observed, "The con-
prior to data collection. The goal of those cept of usefulness . . . depends upon the
evaluations was to produce findings, then perspective and values of the observer. This
they'd worry about how to use whatever means that one person's usefulness may be
was found. Findings would determine use, another person's waste" (p. 155). To help
so until findings were generated, no real intended users deliberate on and commit to
attention was paid to use. intended uses, evaluators need a menu of
Utilization-focused evaluators, in con- potential uses to offer. Utilization-focused
trast, work with intended users to deter- evaluation is a menu-oriented approach.
mine priority uses early in the evaluation It's a process for matching intended uses
process. The agreed-on, intended uses then and intended users. Here, then, is a menu
become the basis for subsequent design de- of three different evaluation purposes based
cisions. This increases the likelihood that on varying uses for evaluation findings. In
an evaluation will have the desired impact. the next chapter, we'll add to this menu a
Specifying intended uses is evaluation's variety of uses of evaluation processes.
equivalent of program goal setting.
Three Uses of Findings
T he purpose of an evaluation conditions the use that can be expected of it.

—Eleanor Chelimsky (1997)
You don't get very far in studying evalu- polished evaluation reports versus oral
ation before realizing that the field is char- briefings and discussions where no written
acterized by enormous diversity. From report is ever generated. Then there are
large-scale, long-term, international com- combinations and permutations of these
parative designs costing millions of dollars contrasting approaches. The annual meet-
to small, short evaluations of a single com- ings of the American Evaluation Associa-
ponent in a local agency, the variety is vast. tion, the Canadian Evaluation Society, and
Contrasts include internal versus external the Australasian Evaluation Society offer an
evaluation; outcomes versus process eval- awesome cornucopia of variations in evalu-
uation; experimental designs versus case ation practice (and ongoing debate about
studies; mandated accountability systems which approaches are really evaluation). In
versus voluntary management efforts; aca- the midst of such splendid diversity, any
demic studies versus informal action re- effort to reduce the complexity of evalu-
search by program staff; and published, ation options to a few major categories will
Intended Uses of Findings • 65
inevitably oversimplify. Yet, some degree of Judgment-Oriented Evaluation

simplification is needed to make the evalu-
ation design process manageable. So let us Evaluations aimed at determining the
attempt to heed Thoreau's advice: overall merit, worth, or value of something
are judgment-oriented. Merit refers to the
intrinsic value of a program, for example,
Simplicity, simplicity, simplicity! I say, let
how effective it is in meeting the needs of
your affairs be as two or three, and not a
those it is intended to help. Worth refers to
hundred or a thousand. (Walden, 1854)
extrinsic value to those outside the pro-
gram, for example, to the larger commu-
nity or society. A welfare program that gets
A Menu for Using Findings: jobs for recipients has merit for those who
Making Overall Judgments, move out of poverty and worth to society
Facilitating Improvements, by reducing welfare costs. Judgment-
and Generating Knowledge oriented evaluation approaches include
performance measurement for public ac-
Evaluation findings can serve three countability; program audits; summative
primary purposes: rendering judgments, ' evaluations aimed at deciding if a program
facilitating improvements, and/or generat- is sufficiently effective to be continued or
ing knowledge. Chelimsky (1997) distin- replicated; quality control and compliance
guishes these three purposes by the per- reports; and comparative ratings or rank-
spective that undergirds them: judgments ings of programs a la Consumer Reports.
are undergirded by the accountability per- The first clue that intended users are
spective; improvements are informed by a seeking an overall judgment is when you
developmental perspective; and generation hear the following kinds of questions: Did
knowledge operates from the knowledge the program work? Did it attain its goals?
perspective of academic values. These are Should the program be continued or
by no means inherently conflicting pur- ended? Was implementation in compliance
poses, and some evaluations strive to incor- with funding mandates? Were funds used
porate all three approaches, but, in my appropriately for the intended purposes?
experience, one is likely to become the Were desired client outcomes achieved?
dominant motif and prevail as the primary Answering these kinds of evaluative ques-
purpose informing design decisions and tions requires a data-based judgment that
priority uses; or else, different aspects of an some need has been met, some goal at-
evaluation are designed, compartmental- tained, or some standard achieved.
ized, and sequenced to address these con- Another clue that rendering judgment
trasting purposes. I also, find that confusion will be a priority is lots of talk about "ac-
among these quite different purposes, or countability." Funders and politicians like
failure to prioritize them, is often the to issue calls for accountability (notably for
source of problems and misunderstandings others, not for themselves), and managing
along the way and can become disastrous for accountability has become a rallying cry
at the end when it turns out that different in both private and public sectors (Kearns
intended users had different expectations 1996). Program and financial audits are
and priorities. I shall discuss each, offering aimed at ensuring compliance with in-
variations and examples. tended purposes and mandated proce-
dures. The program evaluation units of their colleges to spearhead curriculum re-
legislative audit offices, offices of comp- form. There was lots of room for debate
trollers and inspectors, and federal agen- about the merit or worth of the program
cies like the General Accounting Office depending on one's values and priorities,
(GAO) and the Office of Management and but our evaluation found that the funds
Budget (OMB) have government oversight were spent in accordance with the agency's
responsibilities to make sure programs are innovative mandate and many, though not
properly implemented and effective. The all, participants followed through on the
U.S. Government Performance and Results project's goal of providing leadership for
Act of 1993 required annual performance educational change. The funding agency
measurement to "justify" program deci- found sufficient merit and worth that the
sions and budgets. Political leaders in Can- project was awarded a year-long dissemina-
ada, the United Kingdom, and Australia tion grant.
have been active and vocal in attempting to In judgment-oriented evaluations, speci-
link performance measurement to budget- fying the criteria for judgment is central
ing for purposes of accountability (Auditor and critical. Different stakeholders will
General of Canada 1993), and these efforts bring different criteria to the table (or apply
greatly influenced the U.S. federal ap- them without coming to the table, as in
proach to accountability (Breul 1994). Proxmire's case). The funding agency's cri-
Rhetoric about accountability can be- terion was whether participants developed
come particularly strident in the heat of personally and professionally in ways that
political campaigns. Everyone campaigns led them to subsequently exercise inno-
against ineffectiveness, waste, and fraud. vative leadership in higher education;
Yet, one person's waste is another's jewel. Proxmire's criterion was whether, on the
For years, U.S. Senator William Proxmire surface, the project would sound wasteful
of Wisconsin periodically held press con- to the ordinary taxpayer.
ferences in which he announced Golden During design discussions and negotia-
Fleece Awards for government programs tions, evaluators may offer additional cri-
he considered especially wasteful. I had the teria for judgment beyond those initially
dubious honor of being the evaluator for thought of by intended users. As purpose
one such project ridiculed by Proxmire, a and design negotiations conclude, the
project to take higher education adminis- standard to be met by the evaluation has
trators into the wilderness to experience, been articulated in the Joint Committee
firsthand, experiential education. The pro- Program Evaluation Standards:
gram was easy to make fun of: Why should
taxpayer dollars be spent for college deans Values Identification: The perspectives, pro-
to hike in the woods? Outrageous! What cedures, and rationale used to interpret the
was left out of Proxmire's press release was findings should be carefully described, so
that the project, supported by the Fund for that the bases for value judgments are clear.
the Improvement of Postsecondary Educa- (Joint Committee 1994-.U4; emphasis added)
tion, had been selected in a competitive
process and funded because of its innova- Some criteria, such as fraud and gross
tive approach to rejuvenating burned-out incompetence, are sufficiently general and
and discouraged administrators, and that agreed-on that they may remain implicit
many of those administrators returned to as bases for value judgments when the
explicit focus is on goal attainment. Yet, for government programs are account-
finding criminal, highly unethical, or ability-driven. As we shall see, however,
grossly incompetent actions will quickly many evaluations of private sector pro-
overwhelm other effectiveness criteria. grams aimed at internal program im-
One of my favorite examples comes from provement have n o public accountability
an audit of a weatherization program in purpose. First, however, let's review sum-
Kansas as reported in the newsletter of motive evaluation as a major form of
Legislative Program Evaluators. judgment-oriented evaluation.
Summative evaluation constitutes an im-
Kansas auditors visited several homes that portant purpose distinction in any menu
had been weatherized. At one home, workers of intended uses. Summative evaluations
had installed 14 storm windows to cut down judge the overall effectiveness of a program
on air filtration in the house. However, one and are particularly important in making
could literally see through the house because decisions about continuing or terminating
some of the siding had rotted and either an experimental program or demonstra-
pulled away from or fallen off the house. The tion project. As such, summative evalu-
auditors also found that the agency had ations are often requested by funders.
nearly 200 extra storm windows in stock. Summative evaluation contrasts with for-
Part of the problem was that the supervisor mative evaluation, which focuses on ways
responsible for measuring storm windows of improving and enhancing programs
was afraid of heights; he would "eyeball" the rather than rendering definitive judgment
size of second-story windows from the about effectiveness. Michael Scriven (1967:
ground. . . . If these storm windows did not 40-43) introduced the summative-forma-
fit, he ordered new ones. (Hinton 1988:3) tive distinction in discussing evaluation of
educational curriculum. T h e distinction
The auditors also found fraud. T h e has since become a fundamental evalu-
program bought w i n d o w s at inflated ation typology.
prices from a company secretly o w n e d by With widespread use of the summative-
a program employee. A kickback scheme formative distinction has come misuse, so
was uncovered. " T h e w o r k m a n s h i p on it is worth examining Scriven's (1991a)
most homes was shoddy, bordering on own definition:
criminal. . . . [For e x a m p l e ] , workers in-
stalling a roof vent used an ax t o chop a Summative evaluation of a program (or
hole in the roof." Some 2 0 % of bene- other evaluand) is conducted after comple-
ficiaries didn't meet eligibility criteria. tion of the program (for ongoing programs
Findings like these are thankfully rare, but that means after stabilization) and for the
they grab headlines w h e n they become benefit of some external audience or decision-
public, and they illustrate why account- maker (for example, funding agency, over-
ability will remain a central purpose of sight office, historian, or future possible
many evaluations. users). . . . The decisions it services are most
The extent to which concerns about ac- often decisions between these options: ex-
countability dominate a specific study var- port (generalize), increase site support,
ies by the role of the evaluator. For audi- continue site support, continue with condi-
tors, accountability is always primary. tions (probationary status), continue with
Public reports on performance indicators modifications, discontinue.... The aim is to
report on it [the program], not to report to humanistic evaluation, and Total Quality
it. (p. 340) Management (TQM), among others. What
these approaches share is a focus on im-
Summative evaluation provides data to provement—making things better—rather
support a judgment about the program's than rendering summative judgment. Judg-
worth so that a decision can be made ment-oriented evaluation requires preordi-
about the merit of continuing the pro- nate, explicit criteria and values that form
gram. While Scriven's definition focuses the basis for judgment. Improvement-
on a single program, summative evalu- oriented approaches tend to be more open
ations of multiple programs occur when, ended, gathering varieties of data about
like the products in a Consumer Reports strengths and weaknesses with the expecta-
test, programs are ranked on a set of cri- tion that both will be found and each can
teria such as effectiveness, cost, sustain- be used to inform an ongoing cycle of re-
ability, quality characteristics, and so on. flection and innovation. Program manage-
Such data support judgments about the ment, staff, and sometimes participants
comparative merit or worth of different tend to be the primary users of improve-
programs. ment-oriented findings, while funders and
In judgment-oriented evaluations, what external decision makers tend to use judg-
Scriven (1980) has called "the logic of valu- mental evaluation, though I hasten to add
ing" rules. Four steps are necessary: (1) that these associations of particular catego-
select criteria of merit; (2) set standards of ries of users with specific types of evalu-
performance; (3) measure performance; ations represent utilization tendencies, not
and (4) synthesize results into a judgment definitional distinctions; any category of
of value (Shadish, Cook, and Leviton user may be involved in any kind of use.
1991:73, 83-94). This is clearly a deduc- Improvement-oriented evaluations ask
tive approach. In contrast, improvement- the following kinds of questions: What are
oriented evaluations often use an inductive the program's strengths and weaknesses?
approach in which criteria are less formal To what extent are participants progressing
as one searches openly for whatever areas toward the desired outcomes? Which types
of strengths or weaknesses may emerge of participants are making good progress
from looking at what's happening in the and which types aren't doing so well? What
program. kinds of implementation problems have
emerged and how are they being ad-
dressed? What's happening that wasn't ex-
Improvement- Oriented pected? How are staff and clients inter-
Evaluation acting? What are staff and participant
perceptions of the program? What do they
Using evaluation results to improve a like? dislike? want to change? What are
program turns out, in practice, to be fun- perceptions of the program's culture and
damentally different from rendering judg- climate? How are funds being used com-
ment about overall effectiveness, merit, or pared to initial expectations? How is the
worth. Improvement-oriented forms of program's external environment affecting
evaluation include formative evaluation, internal operations? Where can efficiencies
quality enhancement, responsive evalu- be realized? What new ideas are emerging
ation, learning organization approaches, that can be tried out and tested?
The flavor of these questions—their nu- ward desired outcomes. Improvement-

ances, intonation, feel—communicate im- oriented evaluation more generally, how-
provement rather than judgment. Bob ever, includes using information systems to
Stake's metaphor explaining the difference monitor program efforts and outcomes
between summative and formative evalu- regularly over time to provide feedback for
ation can be adapted more generally to the fine-tuning a well-established program.
distinction between judgmental evaluation That's how data are meant to be used as
and improvement-oriented evaluation: part of a Total Quality Management (TQM)
"When the cook tastes the soup, that's approach. It also includes the "decision-
formative; when the guests taste the soup, oriented educational research" of Cooley
that's summative" (quoted in Scriven and Bickel (1985), who built a classroom-
1991a:169). More .generally, anything based information system aimed at "moni-
done to the soup during preparation in the toring and tailoring." For example, by sys-
kitchen is improvement oriented; when the tematically tracking daily attendance
soup is served, judgment is rendered, in- patterns for individuals, classrooms, and
cluding judgment rendered by the cook schools, educational administrators could
that the soup was ready for serving (or at quickly identify attendance problems and
least that preparation time had run out). intervene before the problems became
The metaphor also helps illustrate that chronic or overwhelming. Attendance could
one must be careful to stay focused on also be treated as an early warning indi-
intent rather than activities when differen- cator of other potential problems.
tiating purposes. Suppose that those to
Again, I want to reiterate that we are
whom the soup is served are also cooks, and
focusing on distinctions about intended
the purpose of their tasting the soup is to
and actual use of findings. A management
offer additional recipe ideas and consider
information system that routinely collects
potential variations in seasoning. Then the
fact that the soup has moved from kitchen data can be used for monitoring progress
to table does not mean a change in purpose. and reallocating resources for increased
Improvement remains the primary agenda. effectiveness in a changing environment;
Final judgment awaits another day, a dif- that's improvement-oriented use. How-
ferent serving—unless, of course, the col- ever, if that same system is used for public
lection of cooks suddenly decides that the accountability reporting, that's judgment-
soup as served to them is already perfect oriented use. These contrasting purposes
and no further changes should be made. often come into conflict because the infor-
Then what was supposed to be formative mation needed for management is different
would suddenly have turned out to be sum- from the data needed for accountability; or
mative. And thus are purposes and uses knowing that the data will be used for
often confounded in real-world evaluation accountability purposes, the system is set
practice. up and managed for that purpose and be-
Formative evaluation typically connotes comes useless for ongoing improvement-
collecting data for a specific period of time, oriented decision making. Exhibit 4.1
usually during the start-up or pilot phase of provides an example of how formative
a project, to improve implementation, evaluation can prepare a program for sum-
solve unanticipated problems, and make mative evaluation by connecting these
sure that participants are progressing to- separate and distinct evaluation purposes
EXHIBIT 4.1
Formative and Summative Evaluation of the Saint Paul Technology
for Literacy Center (TLC): A Utilization-Focused Model
TLC was established as a three-year demonstration project to pilot-test the effectiveness of an

innovative, computer-based approach to adult literacy. The pilot project was funded by six Minnesota
Foundations and the Saint Paul Schools at a cost of $1.3 million. The primary intended users of the
evaluation were the school superintendent, senior school officials, and School Board Directors, who
would determine whether to continue and integrate the project into the district's ongoing community
education program. School officials and foundation donors participated actively in designing the
evaluation. The evaluation cost $70,300.
After 16 months of formative evaluation, the summative evaluation began. The formative
evaluation, conducted by an evaluator hired to be part of the TLC staff, used extensive learner
feedback, careful documentation of participation and progress, and staff development activities to
specify the TLC model and bring implementation to a point of stability and clarity where it could be
summatively evaluated. The summative evaluation, conducted by two independent University of
Minnesota social scientists, was planned as the formative evaluation was being conducted.
The summative evaluation began by validating that the specified model was, in fact, being
implemented as specified. This involved interviews with staff and students and observations of the
program in operation. Outcomes were measured using the Test of Adult Basic Education administered
on a pre-post basis to participant and control groups. The test scores were analyzed for all students
who participated in the program for a three month period. Results were compared to data available
on other adult literacy programs. An extensive cost analysis was also conducted by a university
educational economist. The report was completed six months prior to the end of the demonstration,
in time for decision makers to use the results to determine the future of the program. Retention and
attrition data were also analyzed and compared with programs nationally.
to separate and distinct stages in the pro- summative decisions or program improve-
gram's development. ments, but they contribute, often substan-
tially, if a utilization-focused approach is
used at the design stage.
Knowledge-Oriented Evaluation Conceptual use of findings, on the other
hand, contrasts with instrumental use in
Both judgment-oriented and improve- that no decision or action is expected;
ment-oriented evaluations involve the in- rather, it "is the use of evaluations to influ-
strumental use of results (Leviton and ence thinking about issues in a general way"
Hughes 1981). Instrumental use occurs (Rossi and Freeman 1985:388). The evalu-
when a decision or action follows, at least ation findings contribute by increasing
in part, from the evaluation. Evaluations knowledge. This knowledge can be as spe-
are seldom the sole basis for subsequent cific as clarifying a program's model, test-
Comparisons showed significant gains in reading comprehension and math for the participant
group versus no gains for the control group. Adult learners in the program advanced an average of
one grade level on the test for every 52.5 hours spent in TLC computer instruction. However, the report
cautioned that the results showed great variation: high standard deviations, significant differences
between means and medians, ranges of data that included bizarre extremes, and very little correlation
between hours spent and progress made. The report concluded: "Each case is relatively unique. TLC
has created a highly individualized program where learners can proceed at their own pace based on
their own needs and interests. The students come in at very different levels and make very different
gains during their TLC w o r k . . . , thus the tremendous variation in progress" (Council on Foundations
1993:142).
Several years after the evaluation, the Council on Foundations commissioned a follow-up study
on the evaluation's utility. The Saint Paul Public Schools moved the project from pilot to permanent
status. The Superintendent of Schools reported that "the findings of the evaluation and the qualities
of the services it had displayed had irrevocably changed the manner in which adult literacy will be
addressed throughout the Saint Paul Public Schools" (Council on Foundations, 1993:148). TLC also
became the basis for the District's new Five-Year Plan for Adult Literacy. The evaluation was so
well-received by its original philanthropic donors that it led the Saint Paul Foundation to begin and
support an Evaluation Fellows program with the University of Minnesota. The independent Council on
Foundations follow-up study concluded: "Everyone involved in the evaluation—TLC, funding sources,
and evaluators—regards it as a "utilization-focused evaluation The organization and its founders
and funders decided what they wanted to learn and instructed the evaluators accordingly" (Council
on Foundations, 1993:154-55). The formative evaluation was used extensively to develop the program
and get it ready for the summative evaluation. The summative evaluation was then used by primary
intended users to inform a major decision about the future of computer-based adult literacy. Ten
years later, Saint Paul's adult literacy effort continues to be led by TLC's original developer and
director.
SOURCES: Turner and Stockdill 1987; Council on Foundations (1993:129-55).
ing theory, distinguishing types of interven- denigrated. In recent years, they have come
tions, figuring out how to measure out- to be more appreciated and valued (Weiss
comes, generating lessons learned, and/or 1990:177).
elaborating policy options. In other cases, We found conceptual use to be wide-
conceptual use is more vague, with users spread in our follow-up study of federal
seeking to understand the program better; health evaluations. As one project manager
the findings, then, may reduce uncertainty, reported:
offer illumination, enlighten funders and
staff about what participants really experi- The evaluation led us to redefine some target
ence, enhance communications, and facili- populations and rethink the ways we con-
tate sharing of perceptions. In early studies nected various services. This rethinking hap-
of utilization, such uses were overlooked or pened over a period of months as we got a
better perspective on what the findings or ongoing improvement, the connection

meant. But we didn't so much change what to social science theory tends to focus on
we were doing as we changed how we increasing knowledge about h o w effec-
thought about what we were doing. That has tive programs w o r k in general. For ex-
had big pay-offs over time. We're just a lot ample, Shadish (1987) has argued that
clearer now. [DM248:19] the understandings gleaned from evalu-
ations ought to contribute to macrotheo-
This represents an example of concep- ries about " h o w to p r o d u c e i m p o r t a n t
tual use that is sometimes described as social change" (p. 94). Scheirer (1987)
enlightenment. Carol Weiss (1990) has has contended that evaluators ought t o
used this t e r m to describe the effects of draw on and contribute to implementa-
evaluation findings being disseminated to tion theory to better understand the "what
the larger policy c o m m u n i t y "where they and why of program delivery" (p. 59).
have a chance to affect the terms of de- Such knowledge-gene*rating efforts focus
bate, the language in which it is con- beyond the effectiveness of a particular
ducted, and the ideas that are considered program to future p r o g r a m designs and
relevant in its resolution" (p. 176). She policy formulation in general.
continued,
As the field of evaluation has matured
and a vast number of evaluations has accu-
Generalizations from evaluation can perco- mulated, the opportunity has arisen to look
late into the stock of knowledge that partici- across findings about specific programs to
pants draw on. Empirical research has con- formulate generalizations about effective-
firmed this. . . . Decision makers indicate a
ness. This involves synthesizing findings
strong belief that they are influenced by the
from different studies. (It is important to
ideas and arguments that have their origins
distinguish this form of synthesis evalu-
in research and evaluation. Case studies of
ation, that is, synthesizing across different
evaluations and decisions tend to show that
studies, from what Scriven [1994] calls "the
generalizations and ideas that come from
final synthesis," which refers to sorting out
research and evaluation, help shape the de-
and weighing the findings in a single study
velopment of policy. The phenomenon has
to reach a summative judgment.) Cross-
come to be known as "enlightenment" . . . ,
study syntheses have become an important
an engaging idea. The image of evaluation as
contribution of the GAO (1992c) in pro-
increasing the wattage of light in the policy
viding accumulated wisdom to Congress
arena brings joy to the hearts of evaluators.
about h o w to formulate effective policies
(pp. 176-77)
and programs. An example is GAO's
(1992b) report on Adolescent Drug Use
While Weiss has emphasized the infor-
Prevention: Common Features of Promising
mal ways in which evaluation findings
Community Programs. (See Exhibit 4.2.)
provide, over time, a knowledge base for
policy, C h e n (1990, 1 9 8 9 ; Chen and An excellent and important example of
Rossi 1987) has focused on a more formal synthesis evaluation is Lisbeth Schorr's
knowledge-oriented approach in what he (1988) Within Our Reach, a study of pro-
has called theory-driven evaluation. While grams aimed at breaking the cycle of pov-
theory-driven evaluations can provide erty. She identified "the lessons of success-
program models for summative judgment ful programs" as follows (pp. 256-83):
EXHIBIT 4.2
An Example of a Knowledge-Oriented Evaluation
The U.S. General Accounting Office (GAO 1992a) identified "Common Features of Promising
Community Programs" engaged in adolescent drug use prevention. The evaluation was aimed
at enlightening policymakers, in contrast to other possible uses of findings, namely, judging the
effectiveness of or improving specific programs.
Six features associated with high levels of participant enthusiasm and attachment:
1. a comprehensive strategy
2. an indirect approach to drug abuse prevention
3. the goal of empowering youth
4. a participatory approach
5. a culturally sensitive orientation
6. highly structured activities
Six common program problems:
1. maintaining continuity with their participants

2. coordinating and integrating their service components
3. providing accessible services
4. obtaining funds
5. attracting necessary leadership and staff
6. conducting evaluation
• offering a broad spectrum of services; • professionals redefining their roles to re-

• regularly crossing traditional professional spond to severe needs; and
and bureaucratic boundaries, that is, organ- • overall, intensive, comprehensive, respon-
izational flexibility; sive and flexible programming.
• seeing the child in the context of family and
the family in the context of its surroundings, These kinds of "lessons" constitute ac-
that is, holistic approaches; cumulated wisdom—principles of effec-
• coherent and easy-to-use services; tiveness or "best practices"—that can be
• committed, caring, results-oriented staff; adapted, indeed, must be adapted, to spe-
• finding ways to adapt or circumvent tradi- cific programs, or even entire organiza-
tional professional and bureaucratic limitations (Wray and H a u e r 1996). For exam-
tions to meet client needs; ple, the Ford Foundation commissioned
I
an evaluation of its Leadership Program handicaps; they need doctoring, rehabilita-

for C o m m u n i t y F o u n d a t i o n s . This study tion, and fixing of the kind that profes-
of 2 7 c o m m u n i t y foundations over five sionalized services are intended to provide.
years led to a guide for Building Commu- The assets model holds that even the most
nity Capacity (Mayer 1996, 1994, n.d.) distressed person or community has strengths,
that incorporates lessons learned and gen- abilities, and capacities; with investment,
eralizable development strategies for their strengths, abilities, and capacities can
c o m m u n i t y foundations—a distinguished increase. This view is only barely allowed to
and useful example of a knowledge- exist in the independent sector, where or-
generating evaluation. O t h e r examples ganizations are made to compete for funds
include a special evaluation issue of Mar- on the basis of "needs" rather than on the
riage and Family Review devoted to "Ex- basis of "can-do."
emplary Social Intervention Programs" The deficit model—seeing the glass half
( G u t t m a n and Sussman 1995) and a spe- empty—is a barrier to effectiveness in the
cial issue of The Future of Children (CFC independent sector. (Mayer 1993:7-8)
1995) devoted to "Long-Term O u t c o m e s
of Early C h i l d h o o d Programs." T h e McKnight F o u n d a t i o n cluster
In the philanthropic world, a related evaluation and the I n d e p e n d e n t Sector
approach has come to be called cluster study reached similar conclusions concur-
evaluation (Millett 1996; Council on Foun- rently and independently. Such gener-
dations 1993:232-51). A cluster evaluation alizable evaluation findings about princi-
team visits a number of different grantee ples of effective p r o g r a m m i n g have
projects with a similar focus (e.g., grass- become the knowledge base of our profes-
roots leadership development) and draws sion. Being knowledgeable about patterns
on individual grant evaluations to identify of program effectiveness allows evalua-
patterns across and lessons from the whole tors to provide guidance about develop-
cluster (Campbell 1994; Sanders 1994; ment of new initiatives, policies, and
Worthen 1994; Barley and Jenness 1 9 9 3 ; strategies for implementation. Such con-
Kellogg Foundation n.d.). T h e McKnight tributions constitute the conceptual use of
Foundation commissioned a cluster evalu- evaluation findings. Efforts of this kind
ation of 34 separate grants aimed at aid- may be considered research rather than
ing families in poverty. O n e lesson learned evaluation, but such research is ultimately
was that "effective programs have devel- evaluative in nature and i m p o r t a n t to the
oped processes and strategies for learning profession.
about the strengths as well as the needs Synthesis evaluations also help us gener-
of families in poverty" (Patton et al.
ate knowledge about conducting useful
1993:10). This lesson takes on added
evaluations. The premises of utilization-
meaning w h e n connected with the finding
focused evaluation featured in this book
of Independent Sector's review of " C o m -
originally emerged from studying 2 0 fed-
m o n Barriers to Effectiveness in the Inde-
eral evaluations (Patton, Grimes, et al.
pendent Sector":
1977). Those premises have been affirmed
by Alkin et al. (1979) in the model of evalu-
The deficits model holds that distressed peo- ation use they developed by analyzing
ple and communities are "needy"; they're a evaluations from different education dis-
collection of problems, pathologies, and tricts in California and by Wargo (1989) in
his "characteristics of successful program and making more diverse resources avail-
evaluations" identified by studying three able to students.
"unusually successful evaluations of na- The laboratory (NWREL 1977) pro-
tional food and nutrition programs" posed an advocacy-adversary model for
(p. 71). The Council on Foundations com- summative evaluation. Two teams were
missioned a synthesis evaluation based on created; by coin toss, one was designated
nine case studies of major foundation eval- the advocacy, the other the adversary team.
uations to learn lessons about "effective The task of the advocacy team was to
evaluating." (A summary of one of those gather and present data supporting the
case studies is presented as Exhibit 4.1 in proposition that Hawaii's 3-on-2 Program
this chapter.) Among the Council's 35 key was effective and ought to be continued.
lessons learned is this utilization-focused The adversaries were charged with mar-
evaluation premise: "Key 6. Make sure the shalling all possible evidence demon-
people who can make the most use of the strating that the program ought to be
evaluation are involved as stakeholders in terminated.
planning and carrying out the evaluation" The advocacy-adversary model was a
(Council on Foundations 1993:255). combination debate/courtroom approach
to evaluation (Wolf 1975; Kourilsky 1974;
Owens 1973). I became involved as a re-
Applying Purpose source consultant on fieldwork as the two
and Use Distinctions teams were about to begin site visits to
observe classrooms. When I arrived on the
By definition, the three kinds of uses scene, I immediately felt the exhilaration of
we've examined—making overall judg- the competition. I wrote in my journal,
ments, offering formative improvements,
or generating generic knowledge-Hran be No longer staid academic scholars, these are
distinguished clearly. Menu 4.1 presents athletes in a contest that will reveal who is
these three uses with examples of each. best; these are lawyers prepared to use what-
Although conceptually distinct in practice, ever means necessary to win their case. The
these uses can become entangled. Let me teams have become openly secretive about
illustrate with an evaluation of an innova- their respective strategies. These are experi-
tive educational program. enced evaluators engaged in a battle not only
Some years ago, the Northwest Regional of data, but also of wits. The prospects are
Educational Laboratory (NWREL) con- intriguing.
tracted with the Hawaii State Department
of Education to evaluate Hawaii's experi- As the two teams prepared their final
mental "3-on-2 Program," a team teaching reports, a concern emerged among some
approach in which three teachers worked about the narrow focus of the evalua-
with two regular classrooms of primary-age tion. The summative question concerned
children, often in multi-age groupings. whether the Hawaii 3-on-2 program
Walls between classrooms were removed so should be continued or terminated. Some
that three teachers and 40 to 60 children team members also wanted to offer find-
shared one large space. The program was ings about how to change the program or
aimed at creating greater individualization, how to make it better without terminating
increasing cooperation among teachers, it. Was it possible that a great amount of
76 TOWARD MORE USEFUL EVALUATIONS
1 MKJNU4. 1 1
Three Primary Uses of Evaluation Findings
Uses Examples
Judge merit or worth Summative evaluation
Accountability
Audits
Quality control
Cost-benefit decisions
Decide a program's future
Accreditation/licensing
Improve programs Formative evaluation
Identify strengths and weaknesses
Continuous improvement
Quality enhancement
Being a learning organization
Manage more effectively
Adapt a model locally
Generate knowledge Generalizations about effectiveness
Extrapolate principles about what works
Theory building
Synthesize patterns across programs
Scholarly publishing
Policy making
NOTE: Menu 5.1 (Chapter 5) presents a corresponding menu, "Four Primary Uses of Evaluation Logic and
Processes," which includes Enhancing shared understandings, Reinforcing program interventions,
Engagement (participatory and empowerment evaluation), and Developmental evaluation. Menu 5.1 presents
uses where the impact on the program comes primarily from application of evaluation thinking and engaging
in an evaluation process in contrast to impacts that come from using the content of evaluation findings, the
focus of this menu.
time, effort, and money was directed at political, of the Hawaii 3-on-2 Program, we
answering the wrong question? Two par- realized that Hawaii's decision makers
ticipating evaluators summarized the di- should not be forced to deal with a simple
lemma in their published post mortem of save-it-or-scrap-it choice. Middle ground
the project: positions were more sensible. Half-way
measures, in this instance, probably made
As we became more and more conversant more sense. But there we were, obliged to do
with the intricacies, both educational and battle with our adversary colleagues on the
unembellished question of whether to main- nated. "And that will be very interesting,"
tain or terminate the 3-on-2 Program. he agreed. "But afterwards we trust you
(Popham and Carlson 1977:5) will give us answers to our practical ques-
tions, like how to reduce the size of the
In the course of doing fieldwork, the program, make it more cost effective, and
evaluators had encountered many stake- increase its overall quality."
holders who favored a formative evalu- Despite such formative concerns from
ation approach. These potential users some stakeholders, the evaluation pro-
wanted an assessment of strengths and ceeded as originally planned with the focus
weaknesses with ideas for improvement. on the summative evaluation question. But
Many doubted that the program, given its was that the right focus? The evaluation
popularity, could be terminated. They proposal clearly identified the primary in-
recognized that changes were needed, es- tended users as state legislators, members
pecially cost reductions, but that fell in the of the State Board of Education, and the
realm of formative not summative evalu- superintendent. In a follow-up survey of
ation. I had a conversation with one edu- those education officials (Wright and Sa-
cational policymaker that highlighted the chse 1977), most reported that they got the
dilemma about appropriate focus. He em- information they wanted. But the most im-
phasized that, with a high rate of inflation, portant evidence that the evaluation fo-
a declining school-age population, and cused on the right question came from
reduced federal aid, the program was too actions taken following the evaluation
expensive to maintain. "That makes it when the decision makers decided to elimi-
sound like you've already made the deci- nate the program.
sion to terminate the program before the After it was all over, I had occasion to
evaluation is completed," I suggested. ask Dean Nafziger, who had directed the
"Oh, no!" he protested. "All we've de- evaluation as director of evaluation, re-
cided is that the program has to be changed. search, and assessment for NWREL,
In some schools the program has been very whether a shift to a formative focus would
successful and effective. Teachers like it; have been appropriate. He replied,
parents like it; principals like it. How could
we terminate such a program? But in other We maintained attention to the information
schools it hasn't worked very well. The needs of the true decision makers, and ad-
two-classroom space has been redivided hered to those needs in the face of occasional
into what is essentially three self-contained counter positions by other evaluation audi-
classrooms. We know that. It's the kind of ences. . . . If a lesson is to be learned it is this:
program that has some strong political op- an evaluator must determine who is making
position and some strong political support. the decisions and keep the information
So there's no question of terminating the needed by the decision makers as the highest
program and no question of keeping it the priority. In the case of the Hawaii "3 on 2"
same." evaluation, the presentation of program im-
I felt compelled to point out that the provement information would have served
evaluation was focused entirely on whether to muddle the decision-making process. (Per-
the program should be continued or termi- sonal correspondence 1979)
Choosing Among Alternatives rectly with the board of directors. The

evaluators insisted on interacting directly
As the Hawaii case illustrates, the forma- with these primary users to lay the ground-
tive-summative distinction can be critical. work for genuinely summative decision
Formative and summative evaluations in- making. Senior staff decided that no sum-
volve significantly different research foci. mative decision was imminent, so the
The same data seldom serve both purposes evaluation continued in a formative mode,
well. Nor will either a specific formative or and the design was changed accordingly. As
summative evaluation necessarily yield ge- a matter of ethics, the evaluators made sure
neric knowledge (lessons learned) that can that the chair of the board was involved in
be applied to effective programming more these negotiations and that the board
generally. It is thus important to identify agreed to the change in focus. There really
the primary purpose of the evaluation at was no summative decision on the horizon
the outset: overall judgment of merit or because the foundation had a long-term
worth, ongoing improvement, or knowl- commitment to the leadership program.
edge generation? Other decisions about Now, consider a different case, the
what to do in the evaluation can then be evaluation of an innovative school, the Sat-
made in accordance with how best to sup- urn School, in Saint Paul, Minnesota.
port that primary purpose. One frequent Again, the original evaluation design
reaction to posing evaluation alternatives called for three years of formative evalu-
is: "We want to do it all." A comprehensive ation followed by two, final years with a
evaluation, conducted over time and at summative focus. The formative evaluation
different levels, may include all three uses, revealed some developmental problems,
but for any given evaluation activity, or any including lower than desired scores on
particular stage of evaluation, it's critical district-mandated standardized tests. The
to have clarity about the priority use of formative evaluation report, meant only
findings. for internal discussion aimed at program
Consider the evaluation of a leadership improvement, got into the newspapers
program run by a private philanthropic with glaring headlines about problems and
foundation. The original evaluation con- low test scores. The evaluation's visibility
tract called for three years of formative and public reporting put pressure on senior
evaluation followed by two years of sum- district officials to make summative deci-
mative evaluation. The program staff and sions about the program, despite earlier
evaluators agreed that the formative evalu- assurances that the program would have a
ation would be for staff and participant use; full five years before such decisions were
however, the summative evaluation would made. The formative evaluation essentially
be addressed to the foundation's board of became summative when it hit the newspa-
directors. The formative evaluation helped pers, much to the chagrin of staff.
shape the curriculum, brought focus to in- Sometimes, however, program staff like
tended outcomes, and became the basis for such a reversal of intended use when, for
the redesign of follow-up activities and example, evaluators produce a formative
workshops. As time came to make the tran- report that is largely positive and staff want
sition from formative to summative evalu- to disseminate the results as if they were
ation, the foundation's president got cold summative, even though the methods of
feet about having the evaluators meet di- the formative evaluation were aimed only
at capturing initial perceptions of program Evaluation Use and

progress, not at rendering an overall judg- Decision Making: Being
ment of merit or worth. Keeping formative Realistic About Impact
evaluations formative, and summative
evaluations summative, is an ongoing chal- All three uses of evaluation findings—to
lenge, not a one-time decision. When con- render judgment, to improve programs,
textual conditions merit or mandate a shift and to generate knowledge—support deci-
in focus, evaluators need to work with sion making. The three kinds of decisions,
intended users to fully understand the con- however, are quite different.
sequences of such a change. We'll discuss
these issues again in the chapter on situ-
ational responsiveness and evaluator roles. 1. Rendering judgment about overall merit or
Let me close this section with one final worth for summative purposes supports de-
example. cisions about whether a program should be
continued, enlarged, disseminated, or termi-
A national foundation funded a cluster
nated—all major decisions.
evaluation in which a team of evaluators
2. Decisions about how to improve a program
would assemble data from some 30 differ-
tend to be made in small, incremental steps
ent projects and identify lessons for effec-
based on specific evaluation findings aimed
tive community-based health programming
purposefully at instrumental use.
—essentially a knowledge-generating eval-
3. Policy decisions informed by cumulative
uation. The cluster evaluation team had no
knowledge from evaluations imply a weak
responsibility to gather data to improve
and diffuse connection between specific
specific programs nor make summative
evaluation findings and the eventual deci-
judgments. Each separate project had its
sion made—thus the term enlightenment
own evaluation for those purposes. The
use.
cluster evaluation was intended to look for
patterns of effectiveness (and barriers to
Trying to sort out the influence of
same) across projects. Yet, during site visits,
evaluations on decisions has been a major
individual projects provided cluster evalua- focus of researchers studying use. Much
tors with a great deal of formative feedback of the early literature on program evalu-
that they wanted communicated to the ation defined use as immediate, concrete,
foundation, and individual grantees were and observable influence on specific deci-
hungry for feedback and comparative in- sions and program activities resulting di-
sights about how well they were doing and rectly from evaluation findings. For exam-
ways they might improve. As the evaluation ple, Carol Weiss (1972c), one of the
approached time for a final report, senior pioneers in studying use, stated, "Evalu-
foundation officials and trustees asked for ation research is meant for immediate and
summative conclusions about the overall direct use in improving the quality of so-
effectiveness of the entire program area as cial programming" (p. 10). It was with
part of rethinking funding priorities and reference to immediate and direct use that
strategies. Thus, a knowledge-generating Weiss was speaking when she concluded
evaluation got caught up in pressures to that "a review of evaluation experience
adapt to meet demands for both formative suggests that evaluation results have gen-
and summative uses. erally not exerted significant influence on
program decisions" (p. 11). Weiss (1988, this evaluation study on the program we've
1990) reaffirmed this conclusion in her been discussing?
1987 keynote address at the American
Evaluation Association: "The influence of
After coding responses for the nature and
evaluation on program decisions has not
degree of impact (Patton 1986:33), we
noticeably increased" (p. 7). The evalu-
found that 78% of responding decision
ation literature reviewed in the first chap-
makers and 90% of responding evaluators
ter was likewise overwhelming in con-
felt that the evaluation had an impact on
cluding that evaluation studies exert little
the program. We asked a follow-up ques-
influence in decision making.
tion about the nonprogram impacts of the
It was in this gloomy context that I set evaluations:
out with a group of students in search of
evaluations that had actually been used to
help us identify factors that might enhance We've been focusing mainly on the study's
use in the future. (Details about this follow- impact on the program itself. Sometimes
up study of the use of federal health evalu- studies have a broader impact on things be-
ations were presented in Chapter 3 and in yond an immediate program, things like
Patton, Grimes, et al. 1977). Given the pes- general thinking on issues that arise from a
simistic picture of most writings on use, we study, or position papers, or legislation. To
began our study fully expecting our major what extent and in what ways did this evalu-
ation have an impact on any of these .kinds
problem would be finding even one evalu-
of things?
ation that had had a significant impact on
program decisions. What we found was
considerably more complex and less dismal We found that 80% of responding decision
than our original impressions had led us to makers and 70% of responding evaluators
expect. Our results provide guidance in felt these specific evaluation studies had
how to work with intended users to set identifiable nonprogram impacts.
realistic expectations about how much in- The positive responses to the questions
fluence an evaluation will have. on impact are quite striking considering the
predominance of the impression of nonuse
in the evaluation literature. The main dif-
ference here, however, was that the actual
Views From the Field
participants in each specific evaluation
on Evaluation Impact
process were asked to define impact in
terms that were meaningful to them and
Our major question on use was as their situations. None of the evaluations we
follows: studied led directly and immediately to the
making of a major, concrete program deci-
We'd like to focus on the actual impact of this sion. The more typical impact was one in
evaluation study . . . , to get at any ways in which the evaluation provided additional
which the study may have had an impact—an pieces of information in the difficult puzzle
impact on program operations, on planning, of program action, permitting some reduc-
on funding, on policy, on decisions, on think- tion in the uncertainty within which any
ing about the program, and so forth. From decision maker inevitably operates. In most
your point of view, what was the impact of such cases, though the use was modest,
those involved considered the evaluation He went on to say that, following the
worthwhile. evaluation:
The most dramatic example of use re-
ported in our sample was evaluation of a We changed our whole functional approach
pilot program. The program administrator to looking at the identification of what we
had been favorable to the program in prin- should be working on. But again I have a
ciple, was uncertain what the evaluation hard time because these things, none of these
results would be, but was "hoping the re- things occurred overnight, and in an evolu-
sults would be positive." The evaluation tionary process it's hard to say, you know, at
proved to be negative. The administrator what point it made a significant difference or
was "surprised, but not alarmingly so. . . . did it merely verify and strengthen the resolve
We had expected a more positive finding or that you already had. [DM232:17]
we would not have engaged in the pilot
studies" [DM367:13]. The program was As in this example of conceptual use,
subsequently ended, with the evaluation respondents frequently had difficulty as-
carrying "about a third of the weight of the sessing the degree to which an evaluation
total decision" [DM367:8]. Thus, the actually affected decisions made after
evaluation served a summative purpose but completion of the evaluation. This was
was one of only several factors (politics, true, for example, in the case of a large-
impressions already held, competing pri- scale evaluation conducted over several
orities and commitments) that influenced years at considerable cost. The findings
the decision. revealed some deficiencies in the program
Contrast such summative use with the but overall were quite positive. Changes
experiences of a different decision maker corresponding to those recommended in
we interviewed, one who had 29 years of the study occurred when the report was
experience in the federal government, published, but those changes could not be
much of that time directing research. He directly and simply attributed to the
reported the impact of the evaluation about evaluation:
which he was interviewed as follows:
A lot of studies like this confirmed what
close-by people knew and they were already
It served two purposes. One is that it re- taking actions before the findings. So you
solved a lot of doubts and confusions and can't link the finding to the action, that's just
misunderstandings that the advisory com- confirmation.... The direct link between the
mittee had . . . and the second was that it finding and the program decision is very dif-
gave me additional knowledge to support fuse. [DM361:12, 13]
facts that I already knew, and, as I say, broad-
ened the scope more than I realized. In other In essence, we found that evaluations
words, the perceptions of where the organi- provided some additional information
zation was going and what it was that was judged and used in the context of
accomplishing were a lot worse than I had other available information to help reduce
anticipated . . . but I was somewhat startled the unknowns in the making of incre-
to find out that they were worse, yet it wasn't mental program changes. The impact
very hard because it partly confirmed things ranged from "it sort of confirmed our
that I was observing. [DM232:17] impressions . . . , confirming some other
anecdotal information or impression that the point of a utilization-focused approach

we had" [DM209:7, 1 ] to providing a new is not to assume either high or low expec-
awareness that carried over to other tations. The point is to find out what the
programs. expectations of intended users are and ne-
This kind of conceptual use to stimulate gotiate a shared understanding of realistic,
thinking about what's going on and reduce intended use—a mutual commitment that
uncertainty emerged as highly important to can be met. In negotiating the nature and
decision makers. In some cases, it simply degree of evaluation use, that is, setting
made them more confident and deter- goals for the evaluation, it is important to
mined. On the other hand, where a need challenge intended users to be both opti-
for change was indicated, an evaluation mistic and realistic—the twin tensions in
study could help speed up the process of any goal-setting exercise. Whether the ex-
change or provide a new impetus for finally pected type and degree of use hoped for
getting things rolling. Reducing uncer- actually occurs can then be followed up as
tainty, speeding things up, and getting a way of evaluating the evaluation.
things finally started are real impacts—not In part, we need to distinguish a goals-
revolutionary—but real, important im- oriented, up-front definition of use from an
pacts in the opinion of the people we inter- after-the-fact, follow-up definition of use.
viewed. We found few major, direction- King and Pechman (1984, 1982) defined
changing decisions in most programs—few use as "intentional and serious considera-
really summative decisions. Rather, evalution of evaluation information by an indi-
ation findings were used as one piece of vidual with the potential to act on it"
information that fed into a slow, evolu- (1984:244). This definition recognizes that
tionary process of program development. evaluation is only one influence among
Program development is typically a pro- many in the taking of an action or making
cess of "muddling through" (Allison 1971; of a decision; therefore, it is reasonable to
Lindblom 1965, 1959), and program eval- consider an evaluation used if it has been
uation is part of that muddling. Or, as Weiss seriously considered and the findings genu-
(1980) has observed, even major decisions inely taken into account. Such a definition
typically accrete gradually over time makes sense when evaluators are trying to
through small steps and minor adjustments study use after the fact and sort out relative
rather than getting decided all at once at influences. But the question utilization-
some single moment at the end of a careful, focused evaluation asks is: What are the
deliberative, and rational process. expected uses by intended users before and
The impacts of evaluation have most during the evaluation? Maybe serious con-
often been felt as ripples, not waves. The sideration is what they expect as well; but
question is whether such limited impact is maybe they expect more, or less.
sufficient to justify the costs of evaluation. Evaluators need to push intended users
The decision makers and evaluators we to be clear about what, if any, decisions are
interviewed 20 years ago were largely sat- expected to be influenced by an evaluation.
isfied with the type and degree of use they It is worth repeating that none of the fed-
experienced. But times have changed. The eral health decision makers we interviewed
stakes are higher. There's more sophistica- about evaluation use had been involved in
tion about evaluation and, I think, higher a utilization-focused process. That is, qone
expectations for accountability. However, of them had carefully considered how the
EXHIBIT 4.3
Questions to Ask of Intended Users to Establish an
Evaluation's Intended Influence on Forthcoming Decisions
What decisions, if any, are the evaluation findings expected to influence?
(There may not be any, in which case the evaluation's purpose may be simply to generate
knowledge for conceptual use and future enlightenment. If, however, the evaluation is
expected to influence decisions, clearly distinguish summative decisions about program
funding, continuation, or expansion from formative decisions about program improvement
and ongoing development.)
When will decisions be made? By whom? When, then, must the evaluation findings be presented to
be timely and influential?
What is at stake in the decisions? For whom? What controversies or issues surround the decisions?
What's the history and context of the decision-making process?
What other factors (values, politics, personalities, promises already made) will affect the decision
making? What might happen to make the decision irrelevant or keep it from being made? In other
words, how volatile is the decision making environment?
How much influence do you expect the evaluation to have—realistically?
To what extent has the outcome of the decision already been determined?
What data and findings are needed to support decision making?
What needs to be done to achieve that level of influence?
(Include special attention to which stakeholders to involve for the evaluation to have the
expected degree of influence.)
How will we know afterward if the evaluation was used as intended?
(In effect, how can use be measured?)
evaluation would be used in advance of ation expected to influence? What is at

data collection. My experiences in pushing stake? When will decisions be made? By
decision makers and intended users to be whom? What other factors (values, poli-
more intentional and prescient about tics, personalities, promises already made)
evaluation use during the design phase have will affect the decision making? How
taught me that it is possible to significantly much influence do you expect the evalu-
increase the degree of influence evaluations ation to have? What needs to be done to
have. Doing so, however, requires persis- achieve that level of influence? How will
tence in asking the following kinds of ques- we know afterward if the evaluation was
tions: What decisions, if any, is the evalu- used as intended? (In effect, how can use
be measured?) Exhibit 4.3 offers a number working at increased effectiveness, the

of questions to ask of intended users to evaluation should be framed to support
establish an evaluation's intended influence improvement-oriented decision making.
on forthcoming decisions. Skills in offering formative feedback and
creating an environment of mutual respect
and trust between the evaluator and staff
will be as important as actual findings.
Connecting Decisions to Uses
Where the intended users are more con-
Where the answers to the evaluator's cerned about generating knowledge for
questions indicate a major decision about formulating future programs than with
program merit, worth, continuation, ex- making decisions about current programs,
pansion, dissemination, and/or funding is then some form of synthesis or cluster
at stake, then the evaluation should be evaluation will be most appropriate to dis-
designed to render overall judgment— cover generic principles of effectiveness.
summative judgment. The design should be In helping intended users select from the
sufficiently rigorous and the data collected evaluation menu, and thereby focus the
should be sufficiently credible that a sum- evaluation, evaluators may encounter some
mative decision can be made. The findings reluctance to make a commitment. I
must be available in time to influence this worked with one director who proudly
kind of major decision. displayed this sign on his desk: "My deci-
Where the dialogue with primary in- sion is maybe—and that's final." Unfortu-
tended users indicates an interest in iden- nately, the sign was all too accurate. He
tifying strengths and weaknesses, clarify- wanted me to decide what kind of evalu-
ing the program's model, and generally ation should be done. After several frus-
trating attempts to narrow the evaluation's ful. Had I succumbed to the temptation to
focus, I presented what I titled a "MAYBE become the decision maker, an evaluation
DESIGN." I laid out cost estimates for an would have been done, but it would have
all-encompassing evaluation that included been my evaluation, not his. I'm convinced
formative, summative, and knowledge- he would have waffled over using the find-
generating components looking at all as- ings as he waffled over deciding what kind
pects of the program. Putting dollars and of evaluation to do.
time lines to the choices expedited the de- Thus, in utilization-focused evaluation,
cision making considerably. He decided the choice of not dining at all is always on
not to undertake any evaluation "at this the menu. It's better to find out before
time." preparing the meal that those invited to the
I was relieved. I had become skeptical banquet are not really hungry. Take your
about the potential for doing anything use- feast elsewhere, where it will be savored.
Intended Process Uses
Impacts of Evaluation
Thinking and Experiences
U tility is in the eye of the user.

—Halcolm
In the past, the search for use has often been conducted like the search for contraband
in the famous Sufi story about Nasrudin 1 the smuggler.
Nasrudin used to take his donkey across a frontier every day with the panniers loaded
with straw. Since he admitted to being a smuggler, when he trudged home every night,
the frontier guards searched him carefully. They searched his person, sifted the straw,
steeped it in water, even burned it from time to time. Meanwhile, he was becoming
visibly more and more prosperous.
Eventually, he retired to another country, very wealthy. Years later one of the customs
officials encountered him there. "You can tell me now, Nasrudin," he said. "Whatever
was it that you were smuggling, that we could never catch you at?"
"Donkeys," replied Nasrudin grinning.
—Adapted from Shah 1964:59
87
Process as Outcome How do I know this? Because that's

often what intended users tell me when I
In this chapter, we'll consider ways in follow up the evaluations I conduct to
which being engaged in the processes of evaluate use. Months after an evaluation,
evaluation can be useful quite apart from I'll talk with clients (intended users) to get
the findings that may emerge from those their assessments of whether the evaluation
processes. Reasoning processes are evalu- achieved its intended uses and to find out
ation's donkeys; they carry the load. Rea- what other impacts may have resulted.
soning like an evaluator and operating They often say some version of the follow-
according to evaluation's values have im- ing, a response from an experienced and
pacts. When I refer to process use, then, I wise program director:
mean using the logic, employing the rea-
soning, and being guided by the values that
undergird the profession (Fournier 1995; We used the findings to make some changes
Whitmore 1990; House 1980). Exhibit 5.1 in our intake process and improvements in
provides examples of evaluation logic and the treatment program. We reorganized
values. parts of the program and connected them
Those of us trained in the methods of together better. But you know, the big change
research and evaluation can easily take is in our staff's attitude. They're paying more
for granted the logic that undergirds attention to participant reactions on a daily
those methods. Like people living daily basis. Our staff meetings are more outcomes
inside any culture, the way of thinking of oriented and reflective. Staff exchanges
our culture—the research culture—seems about results are more specific and data
natural and easy. However, to practition- based. We're more focused. And the fear of
ers, decision makers, and policymakers, evaluation is gone. Doing the evaluation had
our logic can be hard to grasp and quite a profound impact on our program culture.
unnatural. I'm talking about what appear It really did.
to be very simple, even simplistic, notions
that have profound effects on how one Any evaluation can, and often does,
views the world. Thinking in terms of have these kinds of effects. What's differ-
what's clear, specific, concrete, and observ- ent about utilization-focused evaluation is
able does not come easily to people who that the process of actively involving in-
thrive on, even depend on, vagueness, gen- tended users increases these kinds of
eralities, and untested beliefs as the basis evaluation impacts. Furthermore, the pos-
for action. They're in the majority. Practi- sibility and desirability of learning from
tioners of evaluation logic are a small mi- evaluation processes as well as findings
nority. The good news is that our way of can be made intentional and purposeful.
thinking, once experienced, is often greatly In other words, instead of treating process
valued. That's what creates demand for our use as an informal offshoot, explicit and
services. Learning to see the world as an up-front attention to the potential im-
evaluator sees it often has a lasting impact pacts of evaluation logic and processes can
on those who participate in an evalu- increase those impacts and make them a
ation—an impact that can be greater and planned purpose for undertaking the
last longer than the findings that result evaluation. In that way the evaluation's
from that same evaluation. overall utility is increased.
Intended Process Uses • 89
EXHIBIT 5.1
Examples of the Logic and Values of Evaluation
That Have Impact on and Are Useful to Participants
Who Experience Evaluation Processes
The logic and values of evaluation derive from research methods and communications. These
admonitions constitute a "logic" in the sense that they represent a particular mode of reasoning
viewed as valid within the culture of evaluation. They are values in the sense that they are what
evaluators generally believe. The guidelines and principles below are meant to be illustrative
rather than exhaustive of all possibilities.
Be clear Be clear about goals and purposes; about what's being evalu-
ated, what data will be collected, what judgments are to be made,
how results will be used—indeed, be clear about everything.
Be specific A favorite evaluation clarifying question: "What exactly do you
mean by that?"
Focus and prioritize You can't do or look at everything. Be intentional and purpose-
ful in deciding what's worth doing and knowing.
Be systematic Plan your work; work your plan. Carefully document what
occurs at every stage of decision making and data collection.
Make assumptions explicit Determine what can and cannot be subjected to empirical test.
Operationalize program The fundamental evaluation challenge is determining how
concepts, ideas, and goals to measure and observe, what is important. Reality testing
becomes real at this point.
Distinguish inputs and Confusing processes with outcomes is common.
processes from outcomes
Have data to provide This means a commitment to reality testing in which logic and
empirical support for evidence are valued over strength of belief and intensity of
conclusions emotions.
Separate data-based state- Interpretations go beyond the data and must be understood
ments of fact from interpre- as what they are: interpretations. Judgments involve values,
tations and judgments determining what is desirable or undesirable.
Make criteria and standards The logical mandates to be clear and specific apply to making
for judgments explicit criteria and standards explicit.
Limit generalizations and Overgeneralizations and overly definitive attributions of
causal explanations to what causality are epidemic outside the culture of research and
data support evaluation.
Distinguish deductive from Both are valued but involve different reasoning sequences.
inductive processes
Process Use Defined streams and even rarer waterfalls offer a

stark contrast to the ancient, parched rock.
Process use refers to and is indicated by Each place offers different content for re-
individual changes in thinking and behavior, flection. The substantive insights one re-
and program or organizational changes in ceives may well vary by place, time, and
procedures and culture, that occur among circumstance. But quite beyond those vari-
those involved in evaluation as a result of the ations is the impact that comes from the
learning that occurs during the evaluation very act of reflection—regardless of con-
process. Evidence of process use is repre- tent and place. The impacts of reflection
sented by the following kind of statement and meditation on one's inner sense of self
after an evaluation: "The impact on our are, for me, analogous to the impacts of
program came not just from the findings but engaging in the processes of evaluation,
from going through the thinking process that quite apart from the content of the evalu-
the evaluation required." ation's findings. In this same sense, for
certain developmental purposes—staff de-
velopment, program development, organi-
zation development—it doesn't matter so
An Analogy
much what the focus of an evaluation is, or
what its findings, some impact will come
Before looking in detail at how evalu-
from engaging thoughtfully and seriously
ation processes can affect users, let me
in the process.
suggest an analogy to clarify the distinction
betweenj3rpc£ss_ use versus findings uise. I
hike the Grand Canyon annually. During
the days there, my body hardens and my A Menu: Uses of Evaluation
thoughts soften. I emerge more mellow, Logic and Processes
peaceful, and centered. It doesn't matter
which part of the Canyon I hike: the South In working with intended users, it's im-
Rim or North; whether I descend all the portant to help them think about the po-
way to the Colorado River or stay on the tential and desired impacts of how the
Tonto to explore a side canyon; whether I evaluation will be conducted. Questions
push strenuously to cover as much territory about who will be involved take on a dif-
as possible or plan a leisurely journey; ferent degree of importance when consid-
whether I ascend some interior monument ering that those most directly involved will
like Mount Huethawali or traverse the Su- not only play a critical role in determining
pai platform that runs the length of the the content of the evaluation, and therefore
Canyon—I return different from when I the focus of findings, but they also will be
entered. Not always different in the same the people most affected by exposure to
way. But different. evaluation logic and processes. The degree
Let me suggest that the specifics of place of internal involvement, engagement, and
are like the findings of an evaluation re- ownership will affect the nature and de-
port. The different places provide different gree of impact on the program's culture.
content. From the rim, one can view mag- How funders and users of evaluation think
nificent vistas. Deep within a side canyon, about and calculate the costs and benefits
one can see little and feel completely alone. of evaluation also are affected. The cost-
Much of the Canyon is desert, but rare benefit ratio changes on both sides of the
equation when the evaluation produces not tions I pose as an evaluator (e.g., What
only findings but also serves immediate specific results are you committed to
programmatic needs such as staff develop- achieving and how would you know if you
ment or participant empowerment. accomplished them?) are different from
I differentiate four primary uses of what they are asked by non-evaluators. It's
evaluation logic and processes: (1) enhanc- not so much that other facilitators don't ask
ing shared understandings, especially about these questions, but they don't ask them
results; (2) supporting and reinforcing the with the same seriousness and pursue the
program through intervention-oriented answers with the same rigor and intensity.
evaluation; (3) increasing participants' en- The very process of formulating a mission
gagement, sense of ownership, and self- and goals so they can be evaluated will
determination (participatory and empow- usually have an impact, long before data are
erment evaluation); and (4) program or actually collected to measure effectiveness.
organizational development. I'll discuss A parallel use of evaluation is to increase
each of these, with examples, then consider shared understandings between program
the controversies engendered by using managers and line staff. Following the ad-
evaluation in these ways. monition that "what gets measured gets
done," managers can work with staff under
the guidance of an evaluator to establish a
Using Evaluation to Enhance monitoring system to help everyone in-
Shared Understandings volved stay focused on desired outcomes.
While the data from such a system may
Evaluation both depends on and facili- ultimately support decision making, in the
tates clear communications. Shared under- short run, the impact is to focus staff atten-
standings emerge as evaluation logic pushes tion and energy on priority outcomes. The
the senders of messages to be as specific as process needs to be facilitated in such a way
possible and challenges listeners to reflect that staff can speak openly about whether
on and feed back to senders what they think board and administrative expectations are
they've heard. Shared understandings are meaningful, realistic, and attainable. In
especially important with regard to ex- other words, done properly, evaluation fa-
pected results. For example, board mem- cilitates shared commitments to results
bers and program staff often have different from top to bottom and bottom to top for
notions of what an agency or program is "improved communication between staff
supposed to accomplish. The processes of at different levels of program implementa-
clarifying desired ends and focusing staff tion" (Aubel 1993:13).
efforts on accomplishing those ends by You may have experienced both the
evaluating actual accomplishments ought presence and absence of evaluation logic in
to be primary board functions, but few your education. When a teacher announces
boards fulfill these functions effectively a test and says, "Here's what will be on the
(Carver 1990). test and here's what I'll be looking for,"
I'm often asked to facilitate board or that teacher is manifesting the evaluation
staff retreats to help them learn and apply principle that what gets measured gets
the logic and discipline of evaluation to done. Making criteria explicit and commu-
formulating the organization's mission and nicating them to all concerned is equitable
goals. The feedback I get is that the ques- and fair. In contrast, I've observed teachers
refuse to tell their class what will be on a dation wanted to cast the net wide, so it
test, then later, in individual, informal con- issued a general invitation:
versations, they reveal the test's focus to
persistent and inquiring students. Telling We seek grant proposals that will enhance
everyone would have been more fair. the health of specific ecosystems.
The logic and principles of evaluation
also can be useful in negotiations between The responses varied greatly with many
parties with different perspectives. For ex- completely missing the mark in the opinion
ample, a major foundation was interested of the foundation staff. But what was the
in funding an effort to make schools more mark? A great deal of time and effort was
racially equitable. The school district ex- wasted by hopeful proposal writers who
pressed great interest in such funding but didn't know what criteria to address, and
resisted committing to explicit school staff spent a lot of time sifting through
changes that might undermine building- proposals that had no hope of being
level autonomy or intrude into personnel funded. The process created frustration on
evaluations of principals. Over a period of both sides. After a planning session focused
several months, the funder and school offi- on specifying desired results and explicit
cials negotiated the project. The negotia- evaluation criteria, the second announce-
tions centered on expected evaluation out- ment was quite a bit more focused:
comes. The funder and school district
eventually agreed to focus the project and
We seek grant proposals that will enhance
evaluation on community-based, school-
the health of specific ecosystems. Proposals
specific action plans, activities, and changes
will be judged on the following criteria:
rather than a standardized and prescribed
set of district-determined mandates. Case • clarity and meaningfulness of ecosystem
studies were chosen as the most appropri- definition
ate evaluation method, rather than stan- • private-public sector cooperation
dardized instruments for measuring school • action orientation and likelihood of
climate. The design of the entire project demonstrable impact
was changed and made more focused as a • incorporation of a prevention orientation
result of these negotiations. Applying the B regional coordination
logic of evaluation had a major impact on
the project's design without any data col- This set of criteria eliminates basic re-
lection, findings, or a report. Everyone search proposals, of which a large number
came out of the negotiations clear about were received from universities in the first
what was to happen in the project and how round, and makes it clear that those seek-
it would be evaluated. ing grants must submit as cooperative
groups rather than as single individuals or
Inadequate specification of desired re- entities, also characteristic of a large num-
sults reduces the likelihood of attaining ber of initial proposals. Subsequent an-
those results. Consider how adding a re- nouncements became even more specific
sults orientation changed the Request for when focused on specific action priorities,
Proposals announcement of a major envi- such as pollution prevention. The staff,
ronment-oriented philanthropic founda- with training and facilitation, learned to
tion. In the initial announcement, the foun- use evaluation logic to articulate desired
results, enhance communications, and in- to a different and more controversial use
crease responsiveness. of evaluation processes: intervention-ori-
A different use of evaluation to enhance ented evaluation.
mutual understanding involves designing
the evaluation to "give voice" to the disen-
franchised, underprivileged, poor, and oth-
ers outside the mainstream (Weiss and Evaluation as an Integral
Greene 1992:145). In the evaluation of a Programmatic Intervention
diversity project in the Saint Paul Schools,
a major part of the design included cap- Textbooks on measurement warn that
turing and reporting the experiences of measuring the effects of a treatment (e.g.,
people of color. Providing a way for Afri- a social program) should be independent of
can American, Native American, Chicano- and separate from the treatment itself. For
Latino, and Hmong parents to tell their example, participants who take a pretest
stories to mostly white, corporate funders may perform better in the program than
was an intentional part of the design, one those who do not take the pretest because
approved by those same white corporate the pretest increases awareness, stimulates
funders. Rather than reaching singular con- learning, and/or enhances preparation for
clusions, the final report was a multivocal, program activities. To account for such test
multicultural presentation of different ex- effects, evaluation researchers in the past
periences with and perceptions of the pro- have been advised to use experimental de-
gram's impacts. The medium of the report signs that permit analysis of differences in
carried the message that multiple voices performance for those who took the pretest
needed to be heard and valued as a mani- compared to a group that did not take the
festation of diversity (Stockdill et al., pretest. Integrating data collection into
1992). The findings were used for both program implementation would be consid-
formative and summative purposes, but the ered a problem—a form of treatment
parents and many of the staff were most contamination—under traditional rules of
interested in using the evaluation processes research.
to make themselves heard by those in Departing from defining evaluation as
power. Being heard was an end in itself, rigorous application of social science meth-
quite separate from use of findings. ods opens a different direction in evalu-
Wadsworth (1995) has reported that ation (Patton 1988), one that supports in-
evaluation processes can facilitate interac- tegration of evaluation into program
tions between service providers and service processes. Making data collection integral
users in a way that leads to "connected- rather than separate can reinforce and
ness" and "dialogue across difference" (p. 9). strengthen the program intervention. Such
Each learns to see the service through the an approach also can be cost-effective and
others' eyes. In the process, what began as efficient since, when evaluation becomes
opposing groups with opposing truths is integral to the program, its costs aren't an
transformed into "an affinity-based com- add-on. This enhances the sustainability of
munity of inquiry" with shared truths. evaluation because, when it's built in rather
Using evaluation to enhance shared than added on, it's not viewed as a tempo-
understandings is a relatively traditional rary effort or luxury that can be easily
use of evaluation logic. Let's turn now dispensed with when cuts are necessary.
To illustrate this approach, consider the that is, a design that makes the data collec-
case of a one-day workshop. A traditional tion part of the workshop rather than sepa-
evaluation design, based on standard social rate from and independent of the work-
science standards of rigor, would typically shop. In this scenario, the workshop begins
include a pretest and posttest to assess as follows:
changes in participants' knowledge, skills,
and attitudes. As the workshop opens, par-
The first part of the workshop involves your
ticipants are told,
completing a self-assessment of your knowl-
edge, skills, and attitudes. This will help you
Before we begin the actual training, we want prepare for and get into thinking about the
you to take a pretest. This will provide a things we will be covering today in your
baseline for our evaluation so we can find out training.
how much you already know and then mea-
sure how much you've learned when you T h e w o r k s h o p then proceeds. At the
take the posttest. end of the day, the w o r k s h o p presenter
closes as follows:
At the end of the day, participants are ad-
ministered the same instrument as a post- Now the final workshop activity is for you
test. They are told, to assess what you have learned today. To
that end, we are going to have you retake the
Now the workshop is over, but before you self-assessment you took this morning. This
leave, we need to have you take the posttest will serve as a review of today and let you see
to complete the evaluation and find out how how much you've learned.
much you have benefited from the training.
In this second scenario, the w o r d evalu-
T h e desired design for high internal ation is never mentioned. T h e pre- and
validity w o u l d include, in addition to the post-assessments are explicitly and inten-
pre-post treatment g r o u p , (1) a control tionally part of the w o r k s h o p in accor-
group that takes the pre- and posttests dance with adult learning principles
w i t h o u t experiencing the w o r k s h o p , (2) a (Brookfield 1 9 9 0 ; Knox 1 9 8 7 ; Schon
control group that gets the posttest only, 1987; Knowles et al. 1985). W e k n o w , for
and (3) a treatment group that gets the example, that when participants are told
posttest only. All groups, of course, what they will learn, they become pre-
should be r a n d o m l y selected and assigned, pared for the learning; learning is further
and the administration of the test should enhanced when it is reinforced both im-
be standardized and take place at the same mediately and over the long term. In the
time. Such a design w o u l d permit mea- second scenario, the self-assessment in-
surement of and control for instrumen- strument serves both the function of pre-
tation effects. paring people for learning and as baseline
Let me n o w pose a contrary example of data. T h e posttest serves the dual func-
h o w the evaluation might be handled, a tions of learning reinforcement and evalu-
design that fully integrates the evaluation ation. Likewise, a six-month follow-up t o
data collection into the program delivery, assess retention can serve the dual func-
tions of learning reinforcement and longi- fully interjects data collection in ways that
tudinal evaluation. enhance achievement of program out-
The methodological specialist will note comes, while also meeting evaluation infor-
that the second scenario is fraught with mation needs. We followed this principle in
threats to validity. However, the purpose of evaluating a wilderness program that
data collection in this second scenario is not aimed to turn college administrators into
only assessment of the extent to which leaders in experiential education. Partici-
change has occurred, but increasing the pants hiked 10 days in the Gila Wilderness
likelihood that change will occur. It does of New Mexico in the fall, climbed the
not matter to these particular intended us- Kofa Mountains of Arizona in the winter,
ers (the workshop instructors) how much of and rafted the San Juan River in Utah in the
the measured change is due to pretest sen- spring. During these trips, participants kept
sitization versus actual learning activities, journals for reflection. The program's phi-
or both, as long as the instrument items are losophy was, "One doesn't just learn from
valid indicators of desired outcomes. experience; one learns from reflection on
Moreover, in the second scenario, the data experience." The process of journaling was
collection is so well integrated into the part of the program intervention, but also
program that there are no separate evalu- a prime source of qualitative evaluation
ation costs except for the data analysis data capturing how participants reacted to
itself. Under the second scenario, the ad- and were changed by project participation.
ministration of the pretest and posttest is a In addition, participants were paired to-
part of the program such that even if the gether to interview each other before, dur-
data were not analyzed for evaluation pur- ing, and after each wilderness experience.
poses, the data collection would still take
These interviews were part of the project's
place, making evaluation data collection
reflection process, but also a source of case
highly cost-effective.
data for evaluation. The evaluation process
thus became part of the intervention in
providing participants with experiences in
Principles of Intervention- reflective practice (Schon 1987, 1983). In-
Oriented Evaluation deed, it was on this project that I first
learned how profoundly in-depth inter-
I have called this process intervention- views can affect people. Such personal,
oriented evaluation to make explicit the intensive, and reflective data collection is
direct and integral connection between an intervention. In intervention-oriented
data collection and program results. A pro- evaluation, such data collection is designed
gram is an intervention in the sense that it to reinforce and strengthen the program's
is aimed at changing something. The evalu- impact.
ation becomes part of the programmatic Another, quite different, example comes
intervention to the extent that the way it is from an intervention-designed evaluation
conducted supports and reinforces accom- of an international development effort
plishing desired program goals. called the Caribbean Agricultural Exten-
The primary principle of intervention- sion Project, funded by the U.S. Agency
oriented evaluation is to build a program for International Development (U.S. AID).
delivery model that logically and meaning- The project aimed to improve national
agricultural extension services in eight Ca- ommendations were solidly grounded in

ribbean countries. The project began with knowledge of the farm and household situ-
a rapid reconnaissance survey to identify ation, including labor availability, land
the farming systems in each participating availability, income goals, and past agricul-
island. This involved an interdisciplinary tural experiences. These data were neces-
team of agricultural researchers, social sci- sary for the extension agent to do a good
entists, and extension staff doing fieldwork job of advising farm families about increas-
and interviewing farmers for a period of 10 ing their productivity.
days to identify extension priorities for a These same data were the baseline for
specific agro-ecological zone. This process measuring the program's impact on indi-
served as the basis for needs assessment and vidual farmers for evaluation purposes.
program development. It was also, quite The collection of such data for farm man-
explicitly and intentionally, an intervention agement purposes required training of
in and of itself in that the process garnered agents, and a great deal of time and effort.
attention from both farmers and agricul- It would have been enormously expensive
tural officials, thereby beginning the exten- to collect such data independently, solely
sion mobilization process. In addition, the for purposes of evaluation. However, by
rapid reconnaissance survey served the establishing a record-keeping system for
critical evaluation function of establishing individual farmers that served a primary
baseline data. Subsequent data on the ef- extension purpose, the project also estab-
fects of extension and agricultural develop- lished a record-keeping system for evalu-
ment in the zone were compared against ation purposes. By aggregating the data
this baseline for evaluation purposes. Yet, it from individual households, it was possible
would have been much too expensive to to analyze system-level impact over time.
undertake this kind of intensive team field- The data aggregation and comparative
work simply for purposes of evaluation. analysis were above and beyond the main
Such data collection was practical and cost- program purpose of collecting the data..
effective because it was fully integrated into However, without that program purpose;;
other critical program processes. the data would have been much too expen-
Once the various farming systems were sive to collect solely for evaluation of the
identified and the needs of farmers had system.
been specified within those systems, the The program staff also used the evalu-
extension staff began working with individ- ation design formulated by the external
ual farmers to assess their specific produc- evaluators as the framework for their plan
tion goals. This process included gathering of work, which set the agenda for monthly
data about the farmer's agricultural enter- staff meetings and quarterly staff reports
prises and household income flows. With (an example of using evaluation to enhance
these data in hand, extension agents and focus communications). As such, the
worked with farmers to set realistic goals evaluation priorities were kept before the
for change and to help farmers monitor the staff at all times. As a result, the evaluation
effects of recommended interventions. The process improved program implementa-
program purpose of using this approach, tion from the very beginning by focusing
called a farm management approach, was staff implementation efforts.
to individualize the work of extension Still another powerful example of inter-
agents with farmers so that the agent's rec- vention-oriented evaluation comes from
the Hazelden Foundation, a chemical de- are considerations to discuss with intended
pendency treatment program in Minne- users and evaluation funders in deciding
sota. Part of the program intervention in- the relative priority of different potential
cludes helping clients and their significant uses of evaluation and in reviewing the
others identify their chemical abuse pat- principles of intervention-oriented evalu-
terns. A self-assessment instrument serves ation (Exhibit 5.2). Now, let's examine the
this purpose while also providing baseline use of evaluation processes to engage par-
data on chemical use. After residency treat- ticipants more fully.
ment, all clients and significant others re-
ceive follow-up surveys at six months, one
year, and two years. The follow-up surveys Supporting Engagement,
provide outcomes data on program effec- Self-Determination, and Ownership:
tiveness, but they also remind clients and Participatory, Collaborative, and
their significant others to assess their cur- Empowerment Evaluation
rent chemical use behaviors. Clients who
have relapsed into abusive behaviors are Early in my career, I was commissioned
invited to contact Hazelden for support, by a Provincial Deputy Minister in Canada
assessment, and possible reentry into treat- to undertake an evaluation in a school di-
ment. Thus, the follow-up survey is a vision he considered mediocre. I asked
mechanism for reinforcing treatment and what he wanted the evaluation to focus on.
extending an offer of new help. Many cli- "I don't care what the focus is," he
ents respond to this contact and seek addi- replied. "I just want to get people engaged
tional help. For that reason, the survey is in some way. Education has no life there.
sent to all former clients, not just the small Parents aren't involved. Teachers are just
random sample that would be sufficient if putting in time. Administrators aren't lead-
the survey provided only evaluation data. ing. Kids are bored. I'm hoping evaluation
In my experience, program funders, can stir things up and get people involved
managers, and staff can become very ex- again." That's how the evaluation of the
cited about the creative possibilities for Frontier School Division, described in
integrating evaluation into a program in Chapter 2, began.
such a way that it supports and reinforces The processes of participation and col-
the program intervention. Not only does laboration have an impact on participants
this make the evaluation process more use- and collaborators quite beyond whatever
ful, it often makes the evaluation findings they may accomplish by working together.
more relevant, meaningful, accessible, and In the process of participating in an evalu-
useful. Yet, this approach can be controver- ation, participants are exposed to and have
sial because the evaluation's credibility may the opportunity to learn the logic of evalu-
be undercut by concerns about whether the ation and the discipline of evaluation rea-
data are sufficiently independent of the soning. Skills are acquired in problem iden-
treatment to be meaningful and trustwor- tification, criteria specification, and data
thy; the evaluator's independence may be collection, analysis, and interpretation. Ac-
suspect when the relations with staff and/or quisition of evaluation skills and ways of
participants become quite close; and the thinking can have a longer-term impact
capacity to render an independent, summa- than the use of findings from a particular
tive judgment may be diminished. These evaluation study.
E X H I B I T 5.2
Principles of Intervention-Oriented Evaluation
• The evaluation is designed to support, reinforce, and enhance attainment of desired program
outcomes.
• Evaluation data collection and use are integrated into program delivery and management.
Rather than being separate from and independent of program processes, the evaluation is an
integral part of those processes.
• Program staff and participants know what is being evaluated and know the criteria for judging
success.
• Feedback of evaluation findings is used to increase individual participant goal attainment as
well as overall program goal attainment.
• There are no or only incidental add-on costs for data collection because data collection is part
of program design, delivery, and implementation.
• Evaluation data collection, feedback, and use are part of the program model, that is, evaluation
is a component of the intervention.
Moreover, people who participate in the evaluator becomes a facilitator, collabo-

creating something tend to feel more own- rator, and teacher in support of program
ership of what they have created, make participants and staff engaging in their own
more use of it, and take better care of it. evaluation. While the findings from such a
Active participants in evaluation, therefore, participatory process may be useful, the
are more likely to feel ownership not only more immediate impact is to use'the evalu-
of their evaluation findings, but also of the ation process to increase participants' sense
evaluation process itself. Properly, sensi- of being in control of, deliberative about,
tively, and authentically done, it becomes and reflective on their own lives and situ-
their process. ations.
Participants and collaborators can be The labels participatory evaluation and
staff and/or program participants (e.g., cli- collaborative evaluation mean different
ents, students, community members, etc.). things to different evaluators. Some use
Sometimes administrators, funders, and these phrases interchangeably or as mutu-
others also participate, but the usual con- ally reinforcing concepts (e.g., Dugan
notation is that the primary participants are 1996; Powell, Jeffries, and Selby 1989;
"lower down" in the hierarchy. Participa- Whitmore and Kerans 1988). Wadsworth
tory evaluation is bottom up. (1993b) distinguishes "research on people,
In 1995, evaluators interested in "Col- for people, or with people" (p. 1). Whit-
laborative, Participatory, and Empower- more (1988) has defined the participatory
ment Evaluation" formed a Topical Interest approach as combining "social investi-
Group within the American Evaluation As- gation, education, and action with the
sociation. What these approaches have in ultimate purpose of engendering broad
common is a style of evaluation in which community and social change" (p. 3).
Whkmore worked with a community- Here is clear support for the central
based team and contended that, through premise of this chapter: The process of
the evaluation process, participants not engaging in evaluation can have as much
only gained new knowledge and skills but or more impact than the findings gener-
also created a support network among ated. It was not a group's specific ques-
themselves and gained a greater sense of tions or answers that Uphoff found most
self-efficacy. affected the groups he observed. It was the
In the mid-1980s, several international process of reaching consensus about ques-
grassroots development organizations ad- tions and engaging with each other in the
vocated participatory evaluation as a tool meaning of the answers turned up. The
for community and local leadership devel- process of participatory self-evaluation, in
opment, not only as a management tool and of itself, provided useful learning ex-
(PACT 1986). In advocating for participa- periences for participants.
tory evaluation, the Evaluation Sourcebook Since no definitive definitions exist for
of the American Council of Voluntary participatory and collaborative evaluation,
Agencies for Foreign Service (ACVAFS these phrases must be defined and given
1983) asserted, "Participation is what demeaning in each setting where they're used.
velopment is about: gaining skills for self- Exhibit 5.3 presents what I consider the
reliance" (p. 12). Thus, in developing primary principles of participatory evalu-
countries, participatory evaluation has ation. This list can be a starting point for
been linked to community development working with intended participants to de-
and empowerment; industrialized coun- cide what principles they want to adopt for
tries, where notions of "value-free" social their own process.
science have long been dominant, have Cousins and Earl (1995, 1992) have ad-
come to this idea of linking evaluation vocated participatory and collaborative ap-
participation with empowerment more proaches primarily to increase use of find-
slowly, and, as we shall see later, the notion ings: "Unlike emancipatory forms of action
remains controversial. research, the rationale for participatory
Norman Uphoff (1991) has published A evaluation resides not in its ability to ensure
Field Guide for Participatory Self-Evalu- social justice or to somehow even the socie-
ation, aimed at grassroots community de- tal playing field but in the utilization of
velopment projects. After reviewing a num- systematically collected and socially con-
ber of such efforts, he concluded, structed knowledge" (p. 10). Yet, the
authors go beyond increased use of findings
when they discuss how participation helps
If the process of self-evaluation is carried out create a learning organization. Viewing
regularly and openly, with all group mem- participatory evaluation as a means of cre-
bers participating, the answers they arrive at ating an organizational culture committed
are in themselves not so important as what is to ongoing learning has become an impor-
learned from the discussion and from the tant theme in recent literature linking
process of reaching consensus on what ques- evaluation to learning organizations (e.g.,
tions should be used to evaluate group King 1995; Aubel 1993; Leeuw, Rist, and
performance and capacity, and on what an- Sonnichsen 1993; Sonnichsen 1993) "The
swers best describe their group's present goal of a participatory evaluator is eventu-
status, (p. 272) ally to put him or herself out of work when
EXHIBIT 5.3
Principles of Participatory Evaluation
The evaluation process involves participants in learning evaluation logic and skills, for
example, goal setting, establishing priorities, focusing questions, interpreting data, data-
based decision making, and connecting processes to outcomes.
Participants in the process own the evaluation. They make the major focus and design
decisions. They draw and apply conclusions. Participation is real, not token.
Participants focus the evaluation on process and outcomes they consider important and to
which they are committed.
Participants work together as a group and the evaluation facilitator supports group cohesion
and collective inquiry.
All aspects of the evaluation, including the data, are understandable and meaningful to
participants.
Internal, self-accountability is highly valued. The evaluation, therefore, supports participants'
accountability to themselves and their community first, and exte/nal accountability secondarily,
if at all.
The evaluator is a facilitator, collaborator, and learning resource; participants are decision
makers and evaluators. ^
The evaluation facilitator recognizes and values participants' perspectives and expertise and
works to help participants recognize and value their own and each other's expertise.
Status differences between the evaluation facilitator and participants are minimized.
the research capacity of the organization is and building a culture of learning in a

self-sustaining" (King 1995:89). Indeed, program or organization. Making this kind
the self-evaluating organization (Wildavsky of process use explicit enlarges the menu of
1985) constitutes an important direction in potential evaluation uses. How important
the institutionalization of evaluation logic this use of evaluation should be in any
and processes. given evaluation is a matter for negotiation
Utilization-focused evaluation is inher- with intended users. The practical implica-
ently participatory and collaborative in action of an explicit emphasis on creating a
tively involving primary intended users in learning culture as part of the process will
all aspects of the evaluation. Evidence pre- mean building into the evaluation atten-
sented in earlier chapters has demonstrated tion to and training in evaluation logic and
the effectiveness of this strategy for increas- skills.
ing use of findings. The added emphasis of Not all references to participatory or
this chapter is how participation and col- collaborative evaluation make the link to
laboration can lead to an ongoing, longer- participant learning. Levin (1993) distin-
term commitment to using evaluation logic guished three purposes for collaborative
research: (1) the pragmatic purpose of in- dersman 1996). In so doing, community
creasing use, (2) the philosophical or meth- capacity can also be enhanced as a group
odological purpose of grounding data in realizes and builds on its assets (Mayer
practitioner's perspectives, and (3) the po- 1996, n.d.).
litical purpose of mobilizing for social ac- Empowerment evaluation is most ap-
tion. A fourth purpose, identified here, is propriate where the goals of the program
teaching evaluation logic and skills. In the include helping participants become more
next section, we'll examine in greater self-sufficient and personally effective. In
depth the political uses of evaluation to such instances, empowerment evaluation is
mobilize for social action and support so- also intervention oriented in that the evalu-
cial justice. ation is designed and implemented to sup-
port and enhance the program's desired
outcomes. Weiss and Greene (1992) have
Empowerment Evaluation shown how empowerment partnerships be-
tween evaluators tand program staff were
The theme of the 1993 American Evalu- particularly appropriate in the family sup-
ation Association national conference was port movement, because that movement
"Empowerment Evaluation." David Fetter- emphasized participant and community
man (1993), AEA President that year, de- empowerment.
fined empowerment evaluation as "the use I facilitated a cluster team evaluation of
of evaluation concepts and techniques to 34 programs serving families in poverty
foster self-determination. The focus is on (Patton et al. 1993). A common and impor-
helping people help themselves" (p. 115). tant outcome of those programs was in-
creased intentionality—having participants
Self-determination, defined as the ability to end up with a plan, a sense of direction, an
chart one's own course in life, forms the assumption of responsibility for their lives,
theoretical foundation of empowerment and a commitment to making progress.
evaluation. It consists of numerous intercon- Increased intentionality began with small
nected capabilities that logically follow each first steps. Families in poverty often feel
other . . . : the ability to identify and express stuck where they are or are experiencing a
needs, establish goals or expectations and a downward spiral of worsening conditions
plan of action to achieve them, identify re- and ever greater hopelessness. These pro-
sources, make rational choices from various grams commonly reported that it was a
alternative courses of action, take appropri- major achievement to give people a sense
ate steps to pursue objectives, evaluate short- of hope manifest in a concrete plan that
and long-term results (including reassessing participants had developed, understood,
plans and expectations and taking necessary and believed they could accomplish. In-
detours), and persist in pursuit of those creased intentionality is a commitment to
goals. (Fetterman 1994a:2) change for the better and a belief that such
a change is possible. Thus, the programs
These skills are used to realize the group's collectively placed a great deal of emphasis
own political goals; through self-assess- on developing such skills as goal setting,
ment and a grdup's knowledge of itself, it learning to map out strategies for attaining
goals, and monitoring progress in attaining
achieves accountability unto itself as well as
personal goals. The programs' evaluations
to others (Fetterman, Kaftarian, and Wan-
were built around these family plans and many years of formal education to share the
supported them. Developing family plans responsibility to evaluate their own program
was not an end in itself, but the ability and experiences, learn the language of evalu-
willingness to work on a plan emerged as a ation, deal with data, and report results. It's
leading indicator of the likelihood of suc- very empowering.
cess in achieving longer-term outcomes.
Creating and taking ownership of a plan
became milestones of progress. The next Empowerment
milestone was putting the plan into action. and Social Justice
Another empowering outcome of par-
ticipatory evaluation is forming effective The phrase "empowerment evaluation"
groups for collective action and reflection. can bridle. It comes across to some like a
For example, social isolation is a common trendy buzzword. Others experience it as
characteristic of families in poverty. Isola- oxymoronic or disingenuous. Still others
tion breeds a host of other problems, in- find the phrase offensive and conde-
cluding family violence, despair, and alien- scending. Few people, in my experience,
ation. Bringing participants together to react neutrally. Like the strategic planning
establish mutual goals of support and iden- term proactive, the word empowerment
tifying ways of evaluating (reality-testing) can create hostile reactions and may fall on
goal attainment is a process of community hard times.
development. The very process of working Empowerment carries an activist, social
together on an evaluation has an impact on change connotation, as does a related idea,
the group's collective identity and skills in using evaluation for social justice. Vera, the
collaborating and supporting each other. main character in Nadine Gordimer's
Participants also learn to use expert re- (1994) novel, None to Accompany Me, ex-
sources, in this case, the facilitating evalu- claims, after a lengthy exchange about em-
ator, but inquiry is democratized (IQREC powerment of South African Blacks, "Em-
1997). One poverty program director ex- powerment, what is this new thing? What
plained to me the impact of such a process happened to what we used to call justice?"
as she observed it: (p. 285). Perhaps Vera would have been
pleased by the theme chosen by President
It's hard to explain how important it is to get Karen Kirkhart for the American Evalu-
people connected. It doesn't sound like a lot ation Association national conference in
to busy middle-class people who feel their 1994 (the year after Empowerment Evalu-
problem is too many connections to too ation was the theme): "Evaluation and So-
many things. But it's really critical for the cial Justice."
people we work with. They're isolated. They The first prominent evaluation theorist
don't know how the system works. They're to advocate valuing based on principles of
discouraged. They're intimidated by the sys- social justice was Ernest House (1990b,
tem's jargon. They don't know where to 1980). He has consistently voiced concern
begin. It's just so critical that they get confor democratizing decision making. In that
nected, take action, and start to feel effective. context, he has analyzed the ways in which
I don't know how else to say it. I wish I could evaluation inevitably becomes a political
communicate what a difference it makes for tool in that it affects "who gets what" (dis-
a group of poor people who haven't had tributive justice). Evaluation can enhance
fair and just distribution of benefits and on and facilitate a variety of change pro-
responsibilities, or it can distort such distri- cesses (O'Toole 1995; Kanter, Stein, and
butions and contribute to inequality. In Jick 1992; Fossum 1989; McLean 1982),
rendering judgments on programs, the so- including solving communications prob-
cial justice evaluator is guided by such prin- lems (D'Aprix 1996); conflict resolution
ciples as equality, fairness, and concern for (Kottler 1996); strategic planning (Bryson
the common welfare (Sirotnik 1990). 1995); leadership development (Kouzes
Both social justice and empowerment and Posner 1995; Terry 1993; Bryson and
evaluation change the role of the evaluator Crosby 1992; Schein 1985; Argyris 1976);
from the traditional judge of merit or teamwork (Parker 1996); human resources
worth to a social change agent. Many (Argyris 1974); diversity training (Morri-
evaluators surveyed by Cousins et al. son 1995); shaping organizational culture
(1995) were hostile to or at least ambiva- (Hampden-Turner 1990; Schein 1989); or-
lent about whether participatory evalu- ganizational learning (Aubrey and Cohen
ation can or should help bring about social 1995; Watkins and Marsick 1993; Senge
justice. Certainly, evaluators undertaking 1990; Morgan 1989; Argyris 1982); and
such an approach need to be comfortable defining mission, to name but a few OD
with and committed to it, and such an arenas of action (Handy 1993; Massarik
activist agenda must be explicitly recog- 1990; Morgan 1986; Azumi and Hage
nized by, negotiated with, and formally 1972). Sometimes their methods include
approved by primary intended users. organizational surveys and field observa-
From a utilization-focused perspective, tions, and they may facilitate action re-
the important point is this: Using evalu- search as a basis for problem solving
ation to mobilize for social action, em- (Whyte 1991; Schon 1987; Argyris, Put-
power participants, and support social jus- nam, and Smith 1985; Wadsworth 1984)
tice are options on the menu of evaluation or even evaluation (King 1995; Prideaux
process uses. Since how these options are 1995; Wadsworth 1993a, 1993b; Patton
labeled will affect how they are viewed, 1990:157-62). Program evaluation can be
when discussing these possibilities with pri- viewed as one approach on the extensive
mary intended users, evaluation facilitators menu of organization and program devel-
will need to be sensitive to the language opment approaches. Evaluation's niche is
preferences of those involved. defined by its emphasis on reality testing
Now, we turn to a conceptually different based on systematic data collection for im-
use of evaluation processes, what I'll call provement, judging merit and worth, or
here developmental evaluation. generating knowledge about effective-
ness. The processes of evaluation support
change in organizations by getting people
Program and engaged in reality testing, that is, helping
Organization Development: them think empirically, with attention to
Developmental Evaluation specificity and clarity, and teaching them
the methods and utility of data-based deci-
The profession of program evaluation sion making. Bickman (1994), in an article
has developed parallel to the professions of entitled "An Optimistic View of Evalu-
management consulting and organization ation," predicted that evaluators in the fu-
development (OD). OD consultants advise ture would become more involved in pro-
gram development, especially "front end" 550 grants made by the Northwest Area
assistance as part of a development team. Foundation over five years were congruent
For example, evaluability assessment with its mission. The board used that as-
(Wholey 1994; Smith 1989) has emerged sessment at a retreat to review and then
as a process for evaluators to work with revise the organization's mission. The
program managers to help them get ready process of clarifying the foundation's mis-
for evaluation. It involves clarifying goals, sion with staff and board directors had at
finding out various stakeholders' views of least as much impact as the findings (Hall
important issues, and specifying the model 1992).
or intervention to be assessed. From my Action research (King and Lonnquist
perspective, this is really a fancy term that 1994a, 1994b), evaluability assessment,
gives evaluators a credible niche for doing and mission-oriented evaluation facilitate
program and organizational development. organizational change through the pro-
Time and time again, evaluators are asked cesses staff experience as much as through
to undertake an evaluation only to find that any findings generated. That is also the case
goals are muddled, key stakeholders have for a type of evaluation partnership aimed
vastly different expectations of the pro- explicitly at development: developmental
gram, and the model that the program evaluation.
supposedly represents, that is, its interven-
tion, is vague at best. In other words, the
program has been poorly designed, concep-
tualized, or developed. In order to do an Developmental Evaluation
evaluation, the evaluator has to make up
for these deficiencies. Thus, by default, the I introduced the term developmental
evaluator becomes a program or organiza- evaluation (Patton 1994a) to describe cer-
tional developer. Rog (1985) studied the tain long-term, partnering relationships
use of evaluability assessments and found with clients who are themselves engaged in
that many of them precipitated substantial ongoing program or organizational devel-
program change but did not lead to a for- opment. (See Exhibit 5.4 for a formal defi-
nition of developmental evaluation.) These
mal evaluation. The programs realized
clients incorporate me into their decision-
through the process of evaluability assess-
making process as part of their design
ment that they had a lot more development
teams because they value the logic and
to do before they could or should under-
conceptual rigor of evaluation thought, as
take a formal evaluation, especially a sum-
well as the knowledge I've picked up about
mative evaluation. In such cases, the pro-
effective programming based on accumu-
cesses and logic of evaluation have impact
lated evaluation wisdom. My role is to ask
on program staff quite beyond the use of
evaluative questions and hold their feet to
findings from the assessment.
the fire of reality testing. Evaluation data
Mission-oriented evaluation is an organ- are collected and used as part of this pro-
izational development approach that in- cess, to be sure, but quite above and beyond
volves assessing the extent to which the the use of findings, these development-
various units and activities of the organiza- oriented decision makers want to have
tion are consistent with its mission. For their ideas examined in the glaring light of
example, I evaluated the extent to which evaluation logic.
EXHIBIT 5.4
Developmental Evaluation Defined
Developmental evaluation refers to evaluation processes undertaken for the purpose of

supporting program, project, staff and/or organizational development, including asking
evaluative questions and applying evaluation logic for developmental purposes. The evaluator
is part of a team whose members collaborate to conceptualize, design, and test new approaches
in a long-term, ongoing process of continuous improvement, adaptation, and intentional change.
The evaluator's primary function in the team is to elucidate team discussions with evaluative
questions, data, and logic and to facilitate data-based decision making in the developmental
process.
j Developmentally oriented programs Developmentally oriented leaders in or-

I have as their purpose the sometimes vague, ganizations and programs don't expect (or
I general notion of ongoing development. even want) to reach the state of "stabiliza-
The process is the outcome. They eschew tion" required for summative evaluation.
[ clear, specific, and measurable goals up Staff don't aim for a steady state of pro-
I front because clarity, specificity, and mea- gramming because they're constantly tink-
[ surability are limiting. They've identified ering as participants, conditions, learning,
[ an issue or problem and want to explore and context change. They don't aspire to
i some potential solutions or interventions, arrive at a fixed model that can be general-
\ but they realize that where they end up will ized and disseminated. At most, they may
I be different for different participants—and discover and articulate principles of inter-
I that participants themselves should play a vention and development, but not a repli-
j major role in goal setting. The process cable model that says "do X and you'll get
I often includes elements of participatory Y." Rather, they aspire to continuous prog-
! evaluation, for example, engaging staff and ress, ongoing adaptation, and rapid re-
\ participants in setting personal goals and sponsiveness. No sooner do they articulate
i monitoring goal attainment, but those and clarify some aspect of the process than
' goals aren't fixed—they're milestones for that very awareness becomes an inter-
assessing progress, subject to change as vention and acts to change what they do.
learning occurs—so the primary purpose is They don't value traditional characteristics
i program and organizational development of summative excellence, such as stan-
rather than individual or group empower- dardization of inputs, consistency of treat-
ment. As the evaluation unfolds, program ment, uniformity of outcomes, and clarity
; designers observe where they end up and of causal linkages. They assume a world of
make adjustments based on dialogue about multiple causes, diversity of outcomes, in-
what's possible and what's desirable, consistency of interventions, interactive ef-
though the criteria for what's "desirable" fects at every level—and they find such a
may be quite situational and always subject world exciting and desirable. They never
to change. expect to conduct a summative evaluation
because they don't expect the program—or Development-focused relationships can

world—to hold still long enough for sum- go on for years and, in many cases, never
mative review. They expect to be forever involve formal, written reports.
developing and changing—and they want The evaluator becomes part of the pro-
an evaluation approach that supports de- gram design team or an organization's
velopment and change. management team, not apart from the
Moreover, they don't conceive of devel- team or just reporting to the team, but
opment and change as necessarily improve- fully participating in decisions and facili-
ments. In addition to the connotation that tating discussion about how to evaluate
formative evaluation is ultimately meant to whatever happens. All team members, to-
lead to summative evaluation (Scriven gether, interpret evaluation findings, ana-
1991a), formative evaluation carries a bias lyze implications, and apply results to the
about making something better rather than next stage of development. The purpose
just making it different. From a develop- of the evaluation is to help develop the
mental perspective, you do something dif- intervention; the evaluator is committed
ferent because something has changed— to improving the intervention and uses
your understanding, the characteristics of evaluative approaches to facilitate ongoing
participants, technology, or the world. development.
Those changes are dictated by your current
perceptions, but the commitment to
change doesn't carry a judgment that what Five Examples of
was done before was inadequate or less Developmental Evaluation
effective. Change is not necessarily pro-
gress. Change is adaptation. As one design 1. A community leadership program.
team member said, With two evaluation colleagues, I became
part of the design team for a community
We did the best we knew how with what we leadership program in rural Minnesota.
knew and the resources we had. Now we're The design team included a sociologist, a
at a different place in our development—do- couple of psychologists, a communications
ing and thinking different things. That's specialist, some adult educators, a funder,
development. That's change. But it's not nec- and program staff. All design team mem-
essarily improvement. bers had a range of expertise and experi-
ences. What we shared was an interest in
Developmental programming calls for leadership and community development.
developmental evaluation in which the The relationship lasted over six years
evaluator becomes part of the design team and involved different evaluation ap-
helping to shape what's happening, both proaches each year. During that time, we
processes and outcomes, in an evolving, engaged in participant observation, several
rapidly changing environment of constant different surveys, field observations, tele-
interaction, feedback, and change. The phone interviews, case studies of individu-
developmental perspective, as I experi- als and communities, cost analyses, theory
ence it, feels quite different from the tra- of action conceptualizations, futuring exer-
ditional logic of programming in which cises, and training of participants to do
goals are predetermined and plans are their own evaluations. Each year, the pro-
carefully made for achieving those goals. gram changed in significant ways and new
evaluation questions emerged. Program multiple voices presenting multiple per-

goals and strategies evolved. The evalu- spectives. These voices and perspectives
ation evolved. No final report was ever were facilitated and organized by the evalu-
written. The program continues to evolve ation team, but the evaluator's voice was
—and continues to rely on developmental simply one among many. The developmen-
evaluation. tal evaluation and process are still ongoing
as this is being written. No summative
2. Supporting diversity in schools. A evaluation is planned or deemed appropri-
group of foundations agreed to support ate, though a great deal of effort is going
multicultural education in the Saint Paul into publicly communicating the develop-
Public Schools for 10 or more years. Com- mental processes and outcomes.
munity members identified the problem as
low levels of success for children of color 3. Children's and families' community
on virtually every indicator they examined, initiative. A local foundation made a 20-
for example, attendance, test scores, and year commitment to work with two inner-
graduation. The "solution" called for a city neighborhoods to support a healthier
high degree of community engagement, es- environment for children and families. The
pecially by people of color, in partnering communities are poor and populated by
with schools. The nature of the partnering people of diverse ethnic and racial back-
and interim outcomes were to emerge from grounds. The heart of the commitment was
the process. Indeed, it would have been to provide funds for people in the commu-
"disempowering" to local communities to nity to set their own goals and fund projects
predetermine the desired strategies and they deemed worthwhile. A community-
outcomes prior to their involvement. based steering committee became, in effect,
Moreover, different communities of color a decision-making group for small commu-
—African Americans, Native Americans, nity grants. Grant-making criteria, desired
Hispanics, and Southeast Asians—could be outcomes, and evaluation criteria all had to
expected to have varying needs, set differ- be developed by the local community. The
ing goals, and work with the schools in purpose of the developmental process was
different ways. All of these things had to be to support internal, community-based ac-
developed. countability (as opposed to external judg-
The evaluation documented development by the affluent and distant board of
ments, provided feedback at various levels the sponsoring foundation). My role, then,
from local communities to the overall dis- was facilitating sessions with local commu-
trict, and facilitated the process of community leaders to support their developing
nity people and school people coming to- their own evaluation process and sense of
gether to develop evaluative criteria and shared accountability. The evaluation pro-
outcome claims. Both the program design cess had to be highly flexible and respon-
and evaluation changed at least annually, sive. Aspects of participatory and empow-
sometimes more often. In the design pro- erment evaluation also were incorporated.
cess, lines between participation, pro- Taking a 20-year developmental perspec-
gramming, and evaluation were ignored as tive, where the locus of accountability is
everyone worked together to develop the community-based rather than funder-
program. As noted earlier in this chapter, based, changes all the usual parameters of
the evaluation reports took the form of evaluation.
4. A reflective practice process in adult ation participant observers, my evaluation

education. I've been working for several partner and I provided daily feedback to
years with a suburban adult and commu- program staff about issues surfacing in our
nity education program in facilitating a interviews and observations. Staff used that
reflective practice process for staff develop- feedback to shape the program, not just in
ment and organizational change. We meet the formative sense of improvement, but in
monthly to get reports from staff about a developmental way, actually designing
their action research observations for the the program as it unfolded. My evaluation
last month. The focus of these observations partner and I became part of the decision-
is whatever issue the group has chosen the making staff that conceptualized the pro-
previous month. The reflective practice gram. Our evaluative questions, quite apart
process involves: (1) identifying an issue, from the data we gathered and fed back,
interest, or concern; (2) agreeing to try helped shape the program.
something; (3) agreeing to observe some An example will illustrate our develop-
things about what is tried; (4) reporting mental role. Early in the first trip, we fo-
back to the group individually; (5) identi- cused staff attention on our observation
fying patterns of experience or themes that participants were struggling with the
across the separate reports; (6) deciding transition from city to wilderness. After
what to try next, that is, determining the considerable discussion and input from
action implications of the findings, and (7) participants, staff decided to have evening
repeating the process with the new com- discussions on this issue. Out of those dis-
mitment to action. Over several years, this cussions, a group exercise evolved in
process has supported major curricular and which, each morning and evening, we
organizational change. Evaluation is ongo- threw our arms about, shook our legs, and
ing and feedback is immediate. The process tossed our heads in a symbolic act of casting
combines staff and organizational develop- off the toxins that had surfaced from hid-
ment and evaluation. My role as facilitator den places deep inside. The fresh air,
is to keep them focused on data-based ob- beauty, quiet, fellowship, periods of soli-
servations and help them interpret and ap- tude, and physical activity combined to
ply findings. There are no formal reports "squeeze out the urban poisons." Partici-
and no formative or summative judgments pants left the wilderness feeling cleaner and
in the usual evaluation sense. There is only purer than they had felt in years. They
an ongoing developmental process of in- called that being "detoxified." Like the
cremental change, informed by data and drunk who is finally sober, they took their
judgment, which has led to significant cu- leave from the wilderness committed to
mulative evolution of the entire program. staying clear of the toxins.
This has become a learning organization.
No one was prepared for the speed of
retoxification. Follow-up interviews re-
5. Wilderness education for college ad- vealed that participants were struggling
ministrators. Earlier in this chapter, I de- with reentry. As evaluators, we worked
scribed briefly the use of journals and inter- with staff to decide how to support partici-
views in a wilderness education program as pants in dealing with reentry problems.
an example of intervention-oriented evalu- When participants came back together
ation. That same project provides an exam- three months later, they carried the knowl-
ple of developmental evaluation. As evalu- edge that detox faded quickly and enduring
purification couldn't be expected. Then the But, once again, a note of caution about
wilderness again salved them with its language. The term development carries
cleansing power. Most left the second trip negative connotations in some settings.
more determined than ever to resist retoxi- Miller (1981), in The Book of Jargon, de-
fication, but the higher expectations only fines development as "a vague term used to
made the subsequent falls more distressing. euphemize large periods of time in which
Many came to the third trip skeptical and nothing happens" (p. 208). Evaluators are
resistant. It didn't matter. The San Juan well advised to be attentive to what specific
River didn't care whether participants em- words mean in a particular context to spe-
braced or resisted it. After 10 days rowing cific intended users—and to choose their
and floating, participants, staff, and evalua- terms accordingly.
tors abandoned talking about detox as an One reaction I've had from colleagues is
absolute state. We came to understand it as that: the examples I've shared above aren't
a matter of degree and a process: an ongo- "evaluations" at all but rather organiza-
ing struggle to monitor the poisons around tional development efforts. I won't quarrel
us, observe carefully their effects on our with that. There are sound arguments for
minds and bodies, and have the good sense defining evaluation narrowly in order to
to get to the wilderness when being poi- distinguish genuinely evaluative efforts
soned started to feel normal. This under- from other kinds of organizational muck-
standing became part of the program ing around. But, in each of the examples
model developed jointly by participants, I've shared, and there are many others, my
staff, and evaluators—but as evaluators we participation, identity, and role were con-
led the discussions and pushed for concep- sidered evaluative by those with whom I
tual clarity beyond what staff and partici- was engaged (and by whom I was paid).
pants would likely have been able to do There was no pretense of external inde-
without an evaluation perspective. pendence. My role varied from being
evaluation facilitator to full team member.
In no case was my role external reporting
and accountability.
Commentary on Developmental evaluation certainly in-
Developmental Evaluation volves a role beyond being solely an evalu-
ator, but I include it among the things we
It will be clear to the reader, I trust, that evaluators can do because organizational
my evaluation role in each of the programs development is a legitimate use of evalu-
just reviewed involved a degree of engage- ation processes. What we lose in concep-
ment that went beyond the independent tual clarity and purity with regard to a
data collection and assessment that have narrow definition of evaluation that fo-
traditionally defined evaluation functions. cuses only on judging merit or worth, we
Lines between evaluation and development gain in appreciation for evaluation exper-
became blurred as we worked together col- tise. When Scriven (1995) cautions against
laboratively in teams. I have found these crossing the line from rendering judgments
relationships to be substantially different to offering advice, I think he underesti-
from the more traditional evaluations I mates the valuable role evaluators can play
conducted earlier in my practice. My role in design and program improvement based
has become more developmental. on cumulative knowledge. Part of my value
to a design team is that I bring a reservoir The four kinds of process use identified
of knowledge (based on many years of and discussed here—(1) enhancing shared
practice and having read a great many understandings, (2) reinforcing interven-
evaluation reports) about what kinds of tions, (3) supporting participant engage-
things tend to work and where to anticipate ment, and (4) developing programs and
problems. Young and novice evaluators organizations—have this in common: They
may be well advised to stick fairly close to all go beyond the traditional focus on find-
the data. However, experienced evaluators ings and reports as the primary vehicles for
have typically accumulated a great deal of evaluation impact. As such, these new di-
knowledge and wisdom about what works rections have provoked controversy. Six
and doesn't work. More generally, as a kinds of objections-—closely interrelated,
profession, we know a lot about patterns of but conceptually distinct—arise most con-
effectiveness, I think—and will know more sistently:
over time. That knowledge makes us valu-
able partners in the design process. Cross- 1. Definitional objection. Evaluation
ing that line, however, can reduce indepen- should be narrowly and consistently de-
dence of judgment. The costs and benefits fined in accordance with the "common
of such a role change must be openly ac- sense meaning of evaluation," namely, "the
knowledged and carefully assessed. systematic investigation of the merit or
worth of an object" (Stufflebeam 1994:
323). Anything other than that isn't evalu-
ation. Adding terms such as empowerment
Concerns, Controversies, or developmental to evaluation changes fo-
and Caveats cus and undermines the essential nature of
evaluation as a phenomenon unto itself.
Menu 5.1 summarizes the four primary
uses of evaluation logic and processes dis- 2. Goals confusion objection. The goal
cussed in this chapter. As I noted in opening of evaluation is to render judgment. "While
this chapter, any evaluation can, and often . . . 'helping people help themselves' is a
does, have these kinds of effects uninten- worthy goal, it is not the fundamental goal
tionally or as an offshoot of using findings. of evaluation" (Stufflebeam 1994:323).
What's different about utilization- fo-
cused evaluation is that the possibility and 3. Role confusion objection. Evaluators
desirability of learning from evaluation as people may play various roles beyond
processes, as well as from findings, can be being an evaluator, such as training clients
made intentional and purposeful—an op- or helping staff develop a program, but in
tion for intended users to consider building taking on such roles, one moves beyond
in from the beginning. In other words, being an evaluator and should call the role
instead of treating process use as an infor- what it is, for example, trainer or devel-
mal ripple effect, explicit and up-front at- oper, not evaluator.
tention to the potential impacts of evalu-
ation logic and processes can increase those While one might appropriately assist clients
impacts and make them a planned purpose in these ways, such services are not evalu-
for undertaking the evaluation. In this way ation. . . . The evaluator must not confuse or
the evaluation's overall utility is increased. substitute helping and advocacy roles with
r MENU 5.1
Four Primary Uses of Evaluation Logic and Processes
For the uses below, the impact of the evaluation comes from application of
evaluation thinking and engaging in evaluation processes (in contrast to impacts
that come from using specific findings).
Uses Examples
Enhancing shared understandings Specifying intended uses to provide focus
and generate shared commitment
Managing staff meetings around explicit
outcomes
Sharing criteria for equity/fairness
Giving voice to different perspectives and
valuing diverse experiences
Supporting and reinforcing the Building evaluation into program delivery
program intervention processes
Having participants monitor their own
progress
Specifying and monitoring outcomes as inte-
gral to working with program participants
Increasing engagement, self- Participatory and collaborative evaluation
determination, and ownership Empowerment evaluation
Reflective practice
Self-evaluation
Program and organizational Developmental evaluation
development Action research
Mission-oriented, strategic evaluation
Evaluability assessment
Model specification
NOTE: Menu 4.1 (Chapter 4) presents a corresponding menu, "Three Primary Uses of Evaluation Findings,"
which addresses making judgments (e.g., summative evaluation), improving programs (formative evaluation),
and generating knowledge (e.g., meta-analyses and syntheses).
rendering ol assessments of the merit and/or thing is or is not working (an evaluator's
worth of objects that he/she has agreed to role) is quite different from knowing how
evaluate. (Stufflebeam 1994:324) to fix or improve it (a designer's role).
Scriven (1991a) has been emphatic in argu- 4. Threat to data validity objection.
ing that being able to identify that some- Quantitative measurement specialists teach
that data collection, in order for the results conveys the particular message, positive or
to be valid, reliable, and credible, should be negative, that the client/interest group hopes
separate from the program being evalu- to present, irrespective of the data, or one
ated. Integrating data collection in such a that promotes constructive, ongoing, and
way that it becomes part of the intervention nonthreatening group process. . . .
contaminates both the data and the pro- Many administrators caught in political
gram. conflicts would likely pay handsomely for
such friendly, nonthreatening, empowering
5. Loss of independence objection. Ap- evaluation service. Unfortunately, there are
proaches that depend on close relation- many persons who call themselves evaluators
ships between evaluators and other stake- who would be glad to sell such service,
holders undermine the evaluator's neu- (p. 325)
trality and independence. "It's quite com-
m o n for younger evaluators to 'go native,' These are serious concerns that have
that is, psychologically join the staff of the sparked vigorous debate (e.g., Fetterman
program they are supposed to be evaluating 1995). In Chapter 14 on the politics and
and become advocates instead of evalua- ethics of utilization-focused evaluation,
tors" (Scriven 1991a:41). This can lead to I'll address these concerns with the seri-
overly favorable findings and an inability to ousness they deserve. For the purpose of
give honest, negative feedback. concluding this chapter, it is sufficient to
note that the utilization-focused evaluator
6. Corruption and misuse objection. w h o presents to intended users options
Evaluators w h o identify with and support that go beyond n a r r o w and traditional
program goals, and develop close relation- uses of findings has an obligation t o dis-
ships with staff and/or participants, can be close and discuss objections to such ap-
inadvertently co-opted into serving public proaches. As evaluators explore n e w and
relations functions or succumb to pressure innovative options, they must be clear that
to distort or manipulate data, hide negative dishonesty, corruption, data distortion,
findings, and exaggerate positive results. and selling out are not on the menu.
Even if they manage to avoid corruption, W h e r e primary intended users w a n t and
they may be suspected of it, thus undermin- need an independent, summative evalu-
ing the credibility of the entire profession. ation, that is what they should get. W h e r e
Or these approaches may actually serve they want the evaluator to act inde-
intentional misuse and foster corruption, as pendently in bringing forward improve-
Stufflebeam (1994) worries: ment-oriented findings for formative
evaluation, that is what they should get.
What worries me most about . . . empower- But those are n o longer the only options
ment evaluation is that it could be used as a on the menu of evaluation uses. N e w par-
cloak of legitimacy to cover up highly cor- ticipatory, collaborative, intervention-
rupt or incompetent evaluation activity. oriented, and developmental approaches
Anyone who has been in the evaluation busi- are already being used. T h e utilization-
ness for very long knows that many potential focused issue is not w h e t h e r such ap-
clients are willing to pay much money for a proaches should exist. T h e y already do.
"good, empowering evaluation," one that T h e issues are understanding w h e n such
approaches are appropriate and helping Nasrudin is the classical figure devised by the
intended users make informed decisions dervishes partly for the purpose of halting
about their appropriateness. T h a t ' s what for a moment situations in which certain
the next chapter addresses. states of mind are made clear. . . . Since
Sufism is something which is lived as well as
something which is perceived, a Nasrudin
Note tale cannot in itself produce complete en-
lightenment. On the other hand, it bridges
1. Sufi stories, particularly those about the the gap between mundane life and a trans-
adventures and follies of the incomparable mutation of consciousness in a manner
Mulla (Master) Nasrudin, are a means of com- which no other literary form yet produced
municating ancient wisdom: has been able to attain. (Shah 1964:56)
Focusing Evaluations:
Choices, Options, and Decisions
Desiderata for the Indecisive and Complacent
m Jfc o placidly amid the noise and haste, and remember what peace there may be
^fc. -^ ^ in avoiding options. As far as possible, without surrender, be on good terms
with the indecisive. Avoid people who ask you to make up your mind; they are vexations
to the spirit. Enjoy your indecisiveness as well as your procrastinations. Exercise caution
in your affairs lest you be faced with choices, for the world is full of menus. Experience
the joys of avoidance.
You are a child of the universe, no less than the trees and the stars; you have a right to
do and think absolutely nothing. And if you want merely to believe that the universe is
unfolding as it should, avoid evaluation, for it tests reality. Evaluation threatens compla-
cency and undermines the oblivion of fatalistic inertia. In undisturbed oblivion may lie
happiness, but therein resides neither knowledge nor effectiveness.
—Halcolm's Indesiderata
Being Active-Reactive-Adaptive
Evaluator Roles, Situational Responsiveness,
and Strategic Contingency Thinking
* t ^ " " I utnan propensities in the face of evaluation: feline curiosity; stultifying fear;
| I beguiling distortion of reality; ingratiating public acclamation; inscrutable
selective perception; profuse rationalization; and apocalyptic anticipation. In other words,
the usual run-of-the-mill human reactions to uncertainty.
Once past these necessary initial indulgences, it's possible to get on to the real evalu-
ation issues: What's worth knowing? How will we get it? How will it be used?
Meaningful evaluation answers begin with meaningful questions.
—Halcolm
A young hare, born to royal-rabbit parents in a luxury warren, showed unparalleled

speed. I Ic wan races far and wide, training under the world's best coach. He boasted
that he could beat anyone in the forest.
The only animal to accept the hare's challenge was an old tortoise. This first amused,
then angered the arrogant hare, who felt insulted. The hare agreed to the race, ridiculing
the tortoise to local sports columnists. The tortoise said simply, "Come what may, I
will do my best."
A course was created that stretched all the way through and back around the forest.
The day of the race arrived. At the signal to start, the hare sped away, kicking dust in
the tortoise's eyes. The tortoise slowly meandered down the track.
117
118 • FOCUSING EVALUATIONS
Halfway through the race, rain began falling in torrents. The rabbit hated the feel of
cold rain on his luxuriously groomed fur, so he stopped for cover under a tree. The
tortoise pulled his head into his shell and plodded along in the rain.
When the rain stopped, the hare, knowing he was well ahead of the meandering
tortoise and detesting mud, decided to nap until the course dried. The track, however,
was more than muddy; it had become a stream. The tortoise turned himself over, did
the backstroke, and kept up his progress.
By and by, the tortoise passed the napping hare and won the race. The hare blamed
his loss on "unfair and unexpected conditions," but observant sports columnists
reported that the tortoise had beaten the hare by adapting to conditions as he found
them. The hare, it turned out, was only a "good conditions" champion.
Evaluation Conditions
What are good evaluation conditions? der such ideal conditions? Never. (Bring on
Here's a wish list (generated by some col- another round of drinks.)
leagues over drinks late one night at the The real world doesn't operate under
annual conference of the American Evalu- textbook conditions. Effective evaluators
ation Association). The program's goals are learn to adapt to changed conditions. This
clear, specific, and measurable. Program requires situational responsiveness and
implementation is standardized and well strategic, contingency thinking—what I've
managed. The project involves two years of come to call being active-reactive-adaptive
formative evaluation working with open, in working with primary intended users.
sophisticated, and dedicated staff to im- By way of example, let me begin by
prove the program; this is followed by a describing how the focus of an evaluation
summative evaluation for the purpose of can change over time. To do so, I'll draw
rendering independent judgment to an in- on the menu of uses offered in the previous
terested and knowledgeable funder. The two chapters. In Chapter 4, we considered
evaluator has ready access to all necessary three uses for findings: rendering summa-
data and enthusiastic cooperation from all tive judgments; improving programs for-
necessary people. The evaluator's role is matively; and generating knowledge about
clear and accepted. There are adequate re- generic patterns of effectiveness. In Chap-
sources and sufficient time to conduct a ter 5, we considered four uses of evaluation
comprehensive and rigorous evaluation. processes and logic: enhancing communi-
The original evaluation proposal can be cations and understanding; reinforcing a
implemented as designed. No surprises program intervention; supporting partici-
turn up along the way, like departure of the pant engagement, ownership, and empow-
program's senior executive or the report erment; and program or organizational de-
deadline moved up six months. velopment. The following example will
How often had this experienced group illustrate how these uses can be creatively
of colleagues conducted an evaluation un- combined in a single project to build on
Being Active-Reactive-Adaptive • 119
and reinforce each other over time as this case to illustrate the menus of evalu-
conditions and needs change. Then we'll ation uses, both use of findings and use of
consider a menu of evaluator roles and evaluation logic and processes.
examine some of the situations and contin-
gencies that influence the choice of evalu- Formative evaluation. Parent feedback
ator roles and type of evaluation. surveys have been used from the begin-
ning to make the programs responsive to
parent needs and interests. More recently,
Changing Uses Over Time a large-scale, statewide evaluation involv-
ing 29 programs has used pre-post inter-
Since the 1970s, Minnesota has been at views and videotapes with parents to
the forefront in implementing Early Child- share information and results across pro-
hood Family Education programs through grams. Staff have discussed program vari-
school districts statewide. These programs ations, identified populations with which
offer early screening for children's health they are more and less successful, and
and developmental problems; libraries of shared materials. The evaluation has be-
books, toys, and learning materials; and par- come a vehicle for staff throughout the
ent education classes that include parent- state to share ideas about everything from
only discussions as well as activities with recruitment to outcomes assessment. Every
their infants, toddlers, and preschoolers. program in the state can identify improve-
Parents learn about child development; ments made as a result of this evaluation-
ways of supporting their child's intellec- centered sharing and staff development.
tual, emotional, and physical growth; and
how to take care of themselves as parents. Summative evaluation. Periodically
Some programs include home visits. A hall- the program has produced formal evalu-
mark of the program has been universal ation reports for the state legislature. A
access and outreach; that is, the program is great deal is at stake. For example, in
not targeted selectively to low-income 1992, the program was funded by over
families or those deemed at risk. The pro- $26 million in state aid and local levies.
gram serves over 260,000 young children At a time of severe funding cuts for all
and their parents annually. It is the nation's kinds of programs, the universal access
largest and oldest program of its kind. philosophy and operations of the program
Evaluation has been critical to the pro- came under attack. Why should the state
gram's development, acceptance, and ex- fund parent education and parent-child
pansion over the years. Evaluation meth- activities for middle-class parents? To
ods have included parent surveys, field save money and more narrowly focus the
observations of programs, standardized in- program, some legislators and educators
struments measuring parenting knowledge proposed targeting the program for low-
and skills, interviews with staff and parents, income and at-risk parents. The pro-
pre-post assessments of parent-child inter- gram's statewide evaluation played a ma-
actions, and videotapes of parent-child in- jor role in that debate. The summative
teraction (Mueller 1996). I have been in- report, entitled Changing Times, Chang-
volved with these statewide evaluation ing Families (MECFE 1992), was distrib-
efforts for over 20 years, so I want to use uted widely both within and outside the
legislature. It described parent outcomes a number of programmatic implications,

in great detail, showing that middle-class but at the level of cross-program synthe-
parents had a great deal to learn about sis, it represents knowledge generation.
parenting. Pre-post interviews with 183 The lessons learned by Minnesota staff
parents showed how parent knowledge, have been shared with programs in other
skills, behaviors, and feelings changed. states, and vice versa. One kind of impor-
Smaller samples examined effects on sin- tant lesson learned has been how to use
gle parents and teen parents. The report evaluation processes to enhance program
also included recommendations for pro- effectiveness. We turn, then, to uses of
gram improvement, for example, working evaluation logic and processes.
harder and more effectively to get fathers
involved. The summative evaluation con- Using evaluation to enhance mutual
tributed to the legislature's decision to understanding. All evaluation instru-
maintain universal access and expand sup- ments for 20 years have been developed
port for early childhood parent education with full staff participation. At full-day
programming. State staff felt that, without sessions involving program directors from
a summative evaluation, the program all over the state, rural, urban, and state
would have been especially vulnerable staff have shared understandings about
to questions about the value of serving program priorities and challenges as they
middle-class parents. The summative re- have operationalized outcomes. State staff
port anticipated that policy question, behave shared legislators' priorities. Rural
cause legislators were identified as pri- staff have shared parents' concerns. All
mary intended users. have discussed, and sometimes debated,
what kinds of changes are possible, impor-
Knowledge-generating evaluation. The tant, and/or crucial. The evaluation..be-
fact that program offerings and imple- came a mechanism for formalizing, shar-
mentation vary from district to district ing, and communicating the program's
throughout the state has offered opportu- philosophy, priorities, and approaches
nities to synthesize lessons learned. For among diverse directors separated some-
example, using comparative data about times by hundreds of miles.
varying degrees of effectiveness in chang-
ing parent-child interactions, staff ana- Intervention-oriented evaluation. In
lyzed different ways of working with par- each program, a sample of parents was
ents. One theme that emerged was the selected for pre-post interviews and vide-
importance of directly engaging and train- otapes of parent-child interactions. Staff,
ing parents in how to observe their chil- after evaluation training, conducted the
dren. Early in the program, staff trained interviews and made the videotapes. Staff
in child development underestimated the soon discovered that the processes of in-
skills and knowledge involved in observ- terviewing and videotaping were power-
ing a child. New parents lacked a context ful interventions. In the course of data
or criteria for observing their own chil- collection, staff and parents got to know
dren. Having parents and children come each other, developed rapport and trust,
together in groups provided opportunities and discussed parents' concerns. Soon,
to make observational skills a focus of parents not included in the study design
parent education. This understanding has were asking to be interviewed and video-
taped. Some programs have decided to among children. In so doing, we've been
continue pre-post interviews with all par- engaged in a long-term process of model
ents and are routinely videotaping and specification and program development
reviewing the results with parents. The that go well beyond and have a larger
interviews and videotapes support and re- impact than simply deciding what data to
inforce the program's goal of making par- collect in the next round of evaluation.
ents more reflective and observant about The evaluation deliberation process has
their parentingo. become a vehicle for program development
Data collection has become a valued beyond use of findings about effectiveness.
intervention.
Participation, collaboration, and em-

powerment. The staff has had complete Variable Evaluator Roles Linked
ownership of the evaluation from the be- to Variable Evaluation Purposes
ginning. From determining the focus of
each subsequent evaluation through data Different types of and purposes for
collection and analysis, staff have partici- evaluation call for varying evaluator roles.
pated fully. My role, and that of my evalu- Gerald Barkdoll (1980), as associate com-
ation colleagues, has been to support and missioner for planning and evaluation of
facilitate the process. Program directors the U.S. Food and Drug Administration,
have reported feeling affirmed by the re- identified three contrasting evaluator roles.
search knowledge they have gained. Most His first type, evaluator as scientist, he
recently, they have been interpreting fac- found was best fulfilled by aloof academics
tor analysis and regression coefficients who focus on acquiring technically impec-
generated from the latest statewide effort. cable data while studiously staying above
They have learned how to interpret other the fray of program politics and utilization
evaluation and research studies in the relationships. His second type he called
course of working on their own evalu- "consultative" in orientation; these evalua-
ations. They have taken instruments de- tors were comfortable operating in a col-
veloped for statewide evaluation and laborative style with policymakers and pro-
adapted them for ongoing local program gram analysts to develop consensus about
use. They feel competent to discuss the their information needs and decide jointly
results with school superintendents and the evaluation's design and uses. His third
state legislators. They're also able to en- type he called the "surveillance and com-
gage with confidence in discussions about pliance" evaluator, a style characterized by
what can and can't be measured. aggressively independent and highly criti-
cal auditors committed to protecting the
public interest and ensuring accountability
Developmental evaluation. I've been
(e.g., Walters 1996). These three types re-
involved with many of these program di-
flect evaluation's historical development
rectors for 20 years. Over the years, we've
from three different traditions: (1) social
wrestled with questions of how knowl-
science research; (2) pragmatic field prac-
edge change relates to behavior change,
tice, especially by internal evaluators and
how much importance to attach to attitu-
consultants; and (3) program and financial
dinal change and increases in parent con-
auditing.
fidence, and what outcomes to monitor
When evaluation research aims to gen- Contrast such a national accountability

erate generalizable knowledge about causal evaluation with an evaluator's role in help-
linkages between a program intervention ing a small, rural leadership program of the
and outcomes, rigorous application of so- Cooperative Extension Service increase its
cial science methods is called for and the impact. The program operates in a few
evaluator's role as methodological expert local communities. The primary intended
will be primary. When the emphasis is on users are the county extension agents,
determining a program's overall merit or elected county commissioners, and farmer
worth, the evaluator's role as judge takes representatives who have designed the pro-
center stage. If an evaluation has been com- gram. Program improvement to increase
missioned because of and is driven by pub- participant satisfaction and behavior change
lic accountability concerns, the evaluator's is the intended purpose. Under these con-
role as independent auditor, inspector, or ditions, the evaluation's use will depend
investigator will be spotlighted for policy- heavily on the evaluator's relationship with
makers and the general public. When pro- design team members. The evaluator will
gram improvement is the primary purpose, need to build a close, trusting, and mutually
the evaluator plays an advisory and facili- respectful relationship to effectively facili-
tative role with program staff. As a member tate the team's decisions about evaluation
of a design team, a developmental evalu- priorities and methods of data collection
ator will play a consultative role. If an and then take them through a consensus-
evaluation has a social justice agenda, the building process as results are interpreted
and changes agreed on.
evaluator becomes a change agent.
In utilization-focused evaluation, the These contrasting case examples illus-
evaluator is always a negotiator—negotiat- trate the range of contexts in which pro-
ing with primary intended users what other gram evaluations occur. The evaluator's
roles he or she will play. Beyond that, all role in any particular study will depend on
roles are on the table, just as all methods matching her or his role with the context
are options. Role selection follows from and purposes of the evaluation as negoti-
and is dependent on intended use by inated with primary intended users.
tended users.
Consider, for example, a national evalu- Academic Versus
ation of food stamps to feed low-income Service Orientations
families. For purposes of accountability
and policy review, the primary intended One of the most basic role divisions in
users are members of the program's over- the profession is that between academic
sight committees in Congress (including and service-oriented evaluators, a division
staff to those committees). The program is identified by Shadish and Epstein (1987)
highly visible, costly, and controversial, es- when they surveyed a stratified random
pecially because special interest groups dif- sample of the members of the Evaluation
fer about its intended outcomes and who Network and the Evaluation Research So-
should be eligible. Under such conditions, ciety, the two organizations now merged
the evaluation's credibility and utility will as the American Evaluation Association.
depend heavily on the evaluators' inde- The authors inquired about a variety of
pendence, ideological neutrality, methodo- issues related to evaluators' values and
logical expertise, and political savvy. practices. They found that responses clus-
tered around two contrasting views of the American Evaluation Association

evaluation. AcademicMiûators tend to be elected successive presidents who repre-
at universities and emphasize the research sented two quite divergent perspectives.
purposes of evaluation, traditional stan- While on the surface the debate was partly
dards of methodological rigor, summative about methods—quantitative versus quali-
outcome studies, and contributions to so- tative, a fray we shall enter in Chapter
cial science theory. Servic£__eyaluators tend 12—the more fundamental conflict cen-
to be independent consultants or Internal tered on vastly different images of the pro-
evaluators and emphasize serving stake- fession.
holders' needs, program improvement, Yvonna Lincoln (1991), in her 1990
qualitative methods, and assisting with pro- presidential address, advocated what I
gram decisions. According to Shadish and would call an activist role for evaluators,
Epstein, "The general discrepancy between one that goes beyond just being competent
service-oriented and academically oriented applied researchers who employ traditional
evaluators seems warranted on both theo- scientific methods to study programs—the
retical and empirical grounds" (p. 587). academic perspective. She first lamented
In addition, Shadish and Epstein (1987) and then disputed the notion that " 'sci-
found that 3 1 % of the respondents de- ence' is about knowing and 'art' is about
scribed their primary professional identity feeling and spiritual matters" (p. 2). She
as that of "evaluator" (p. 560). Others went on to talk about the need for an "arts
thought of themselves first as a psycholo- and sciences of eyalua£ion.as plurals" (em-
gist, sociologist, economist, educator, and phasis in the original) in which she identi-
so on, with identity as an evaluator second- fied four new sciences and six new arts for
ary. Evaluators whose primary professional evaluation (pp. 2-6).
identity was evaluation were more likely to
manifest the service/stakeholder orienta- Lincoln's New
tion, with an emphasis on formative evalu- Sciences of Evaluation
ation and commitment to improved pro-
gram decision making. Those who did not 1. The science of locating interested
identify primarily as evaluators (but rather stakeholders
took their primary identity from a tradi- 2. The science of getting information—
tional academic discipline) were signifi- good, usable information—to those
cantly more likely to be engaged in aca- same stakeholders
demic evaluative research emphasizing 3. The science of teaching various stakeholder
research outcomes and summative judg- groups how to use information to empower
ments (p. 581). themselves
4. A science of communicating results
Enter Morality: Activist Lincoln's New Arts of Evaluation

Versus Academic Evaluation
1. The art of judgment, not only our own, but
The profession of evaluation remains eliciting the judgments of stakeholders
very much split along these lines, but with 2. The art of "appreciating" in our stakeholders
new twists and, perhaps, deeper antago- and in ourselves, that is, comprehending
nisms. The schism erupted openly, and per- meaning within a context. . ., seeing some-
haps deepened, in the early 1990s, when thing fully
3. The art of cultural analysis But what alienated Sechrest the most
4. The art of hearing secret harmonies, that is, was the tone of moral superiority he heard
listening for meanings in Lincoln's call for a new arts and sciences
5. The art of negotiating, not only our con- of evaluation. Referring t o a notice he had
tracts, but the worlds in which our target seen, presumably inspired by Lincoln's per-
populations and audiences live spective, that invited organizing "the N e w
6. The art of dealing with people different from Power Generation" into a group that might
ourselves be called The Moral Evaluators, he replied,
Lincoln (1991) closed her speech by The most offensive part of the text, however,
asserting that "my message is a moral is the arrogation of the term "moral" by these
o n e . " Because evaluators are a powerful new generation, rebellious evaluators. As
g r o u p , these n e w arts and sciences have constructionists, they should know that mo-
profound moral implications, she argued, rality, like so many other things, is in the eye
including speaking truth to power, of the beholder. They do not look so extraor-
dinarily moral to me. (p. 5)
and to make that truth grounded in lived
experience and in multiple voices. . . . We Sechrest (1992) closed by implying that the
need to move beyond cost-benefit analyses activist stance advocated by Lincoln, pre-
and objective achievement measures to inter- sumably based on commitment to shaping
pretive realms . . . to begin talking about a better world, could be viewed as corrupt.
what our programs mean, what our evalu- Noting that academic evaluators w h o use
ations tell us, and what they contribute to our traditional quantitative methods also care
understandings as a culture and as a society. about finding programs that work, he
We need literally to begin to shape—as a chided, "They are simply not willing to
shaman would—the dreams of all of us into fudge very much to do so" (p. 6).
realities, (p. 6) W h a t Shadish and Epstein (1987) origi-
nally distinguished as academic versus ser-
T h e following year, the American vice orientations has evolved, I believe, into
Evaluation Association president was Lee different conceptions of evaluator activism
Sechrest, w h o by his own definition rep- that continue to split the profession. T h e
resented the traditional, academic view of Lincoln-Sechrest debate foreshadowed a
evaluation. H e objected to Lincoln's no less rancorous exchange between two
metaphorical call for a new generation of more evaluation luminaries, Dan Stuffle-
evaluators. "I ask myself," Sechrest (1992) beam (1994) and David Fetterman (1995),
mused, "could we of the preceding gen- about the morality of empowerment evalu-
erations possibly have given rise to this ation. Stufflebeam fears that such an activ-
new, Fourth Generation? W h e r e in our ist orientation will undermine the credibil-
m a k e u p are the origins of this new crea- ity and integrity of the field. Fetterman sees
ture so unlike us. . . . I sense a very real evaluator activism as realizing the full p o -
and large generational gap" (p. 2). H e tential of the profession t o contribute to
went on to argue the merits of traditional creating a better world, especially for the
scholarly approaches to evaluation, espe- disempowered.
cially the use of quantitative and experi- The degree of evaluator activism is a
mental m e t h o d s . continuum, the endpoints of which have
been defined by Lincoln and Fetterman Neither more nor less activism, in my
on the activist side and by Sechrest and judgment, is morally superior. Various de-
Stufflebeam on the academic side. One per- grees of activism involve different ways to
spective places evaluators in the fray, even practice as an evaluator, often in different
arguing that we have a moral obligation to arenas. Indeed, how activist to be involves
acknowledge our power and use it to help consideration of an evaluation's purpose,
those in need who lack power. The other decisions about intended users and uses,
perspective argues that evaluation's integ- and the evaluator's own values and com-
rity and long-term contribution to shaping mitments, all of which need to be made
a better world depend on not being per- explicit. The challenge will be to create
ceived as advocates, even though we push appreciation for such diversity among
for use. Eleanor Chelimsky (1995a), a long- those both within and outside the profes-
time champion of evaluation use and the sion who have a single and narrow view of
person who conceived the goal "intended evaluation and its practice. The debate will,
use by intended users," took the occasion and should, go on, for that's how we dis-
of her 1995 presidential address to the cover the implications and ramifications of
American Evaluation Association to warn diverse approaches, but I foresee and desire
against being perceived as taking sides. no turning back of the clock to a single
dominant perspective.
What seems least well understood, in my In their original research on the emerg-
judgment, is the dramatically negative and ing schism in evaluation, Shadish and
long-term impact on credibility of the ap- Epstein (1987) anticipated that, while the
pearance of advocacy in an evaluation. There profession's diversity can help make the
is a vast cemetery out there of unused evalu- field unique and exciting, it also has the
ation findings that have been loudly or potential for increasing tensions between
quietly rejected because they did not "seem" activist and academic interests, "tensions
objective. In short, evaluators' survival in a that arise because of the different demands
political environment depends heavily on and reward structures under which the two
their credibility, as does the use of their groups often operate" (p. 587). They went
findings in policy, (p. 219) on to note that such tensions could lead to
polarization, citing as evidence the debate
My own view, focused as always on within psychology between practicing ver-
utility, is that these different stances, in- sus academic clinical psychologists, which
deed the whole continuum of evaluator has led to a major schism there. They
activism, constitute options for discussion concluded,
and negotiation with primary intended
users. Chelimsky's primary intended users To the extent that the underlying causes are
were members of Congress. Fetterman's similar—and there are indeed some impor-
were, among others, disenfranchised and tant relevant similarities in the political and
oppressed Blacks in South African town- economic characteristics of the two profes-
ships. sions—the lesson to evaluation is clear.
Both national policymakers and people Program evaluation must continue its efforts
in poverty can benefit from evaluation, but to accommodate diverse interests in the same
not in the same ways and not with the same profession. . . . In the long run, evaluation
evaluator roles. will not be well served by parochialism of any
kind—in patterns of practice or anything tion of ethical guidance, and a commitment

else. (p. 588) to professional competence and integrity,
but there are no absolute rules an evaluator
As the professional practice of evalu- can follow to know exactly what to do with
ation has become increasingly diverse, the specific users in a particular situation.
potential roles and relationships have That's why Newcomer and Wholey (1989)
multiplied. Menu 6.1 offers a range of concluded in their synthesis of knowledge
dimensions to consider in defining the about evaluation strategies for building
evaluator's relationship to intended users. high-performance programs: "Prior to an
Menu 6.2 presents options that can be evaluation, evaluators and program man-
considered in negotiations with intended agers should work together to define the
users. The purpose of these menus is to ideal final product" (p. 202). This means
elaborate the multiple roles now available negotiating the evaluation's intended and
to evaluators and the kind of strategic, expected uses.
contingency thinking involved in making Every evaluation situation is unique. A
role decisions. successful evaluation (one that is useful,
I would hope that these menus help practical, ethical, and accurate) emerges
communicate the rich diversity of the field, from the special characteristics and condi-
for I agree with Shadish and Epstein (1987) tions of a particular situation—a mixture
that "in the long run, evaluation will not be of people, politics, history, context, re-
well served by parochialism of any kind— sources, constraints, values, needs, interests,
in patterns of practice or anything else" and chance. Despite the rather obvious,
(p. 588). A parochial practice is one that almost trite, and basically commonsense
repeats the same patterns over and over. A nature of this observation, it is not at all
pluralistic and cosmopolitan practice is one obvious to most stakeholders, who worry a
that adapts evaluation practices to new great deal about whether an evaluation is
situations. being done "right." Indeed, one common
objection stakeholders make to getting ac-
tively involved in designing an evaluation
Situational Evaluation t is that they lack the knowledge to do it
right. The notion that there is one right way
There is no one best way to conduct an to do things dies hard. The rightjway^from
evaluation. a utilization-focused perspectivj^__is_the
This insight is critical. The design of a way that will be meaningful andjiseful_to
particular evaluation depends on the peo- the specific evaluators and intended users
ple involved and their situation. Situa- jnvolved, and finding that way requires
tional evaluation is like situational ethics interaction, negotiation, and situational
analysis.
(Fletcher 1966) situational leadership
(Blanchard 1986; Hersey 1985), or situ- Alkin (1985) identified some 50 factors
ated learning: "Action is grounded in the associated with use. He organized them
concrete situation in which it occurs" (An- into four categories:
derson et al. 1996:5). The standards and
principles of evaluation (see Chapters 1 1. Evaluator characteristics, such as commit-
and 2) provide overall direction, a founda- ment to make use a priority, willingness
DIMENSIONS AFFECTING EVALUATOR AND USER ENGAGEMENT
1. Relationship with primary intended users
Distant from/ Close to/
noninteractive highly interactive
2. Control of the evaluation process

Evaluator directed Directed by primary
and controlled; intended users;
evaluator as primary evaluator consults
decision maker
3. Scope of intended user involvement
Very narrow; Involved /in some Involved in all aspects

primarily as audience parts (usually focus of the evauation
for findings but not methods or analysis) from start to finish
4. Number of primary intended users and/or stakeholders—engaged
None All constituencies

One A few Many -
represented
5. Variety of primary intended users engaged

Homogenous Heterogeneous
Dimensions of heterogeneity:
(a) Position in program (funders, board executives, staff, participants, community members, media, onlookers)
(b) Background variables: cultural/racial/ethnic/gender/social class
(c) Regional: geographically near or far
(d) Evaluation: sophistication and experience
(e) Ideology (political perspective/activism)
6. Time line for the evaluation

Tight deadline; Long developmental
little time for timeline; time for
processing with users processing with users
j MENU 6.2
1
Optional Evaluator Roles
Primary Dominant Style Most Likely Primary Evaluator

Most Lively Primary Users Evaluator Roles of Evaluator Evaluation Purpose Characteristics Affecting Use
1. Funders Judge Authoritative Summative determination of Perceived independence

Officials overall merit or worth Methodological expertise
Decision makers Substantive expertise
Perceived neutrality
2. Funders Auditor Independent Accountability Independence

Policymakers Inspector Compliance Perceived toughness
Board members Investigator Adherence to rules Detail-oriented
Thoroughness
3. Academics Researcher Knowledgeable Generate generalizable Methodological expertise

Planners knowledge; truth Academic credentials
Program designer Scholarly status
Policy specialists Peer review support
4. Program staff Consultant for program Interactive Program improvement Perceived understanding of
Program executives improvement Perceptive Trust program
and administrators Insightful Rapport
Participants Insightfulness
5. Diverse stakeholders Evaluation facilitator Available Facilitate judgments and Interpersonal skills
Balanced recommendations by Group facilitation skills
Empathic non-evaluators Evaluator knowledge
Trust
Consensus-building skills
6. Program design team Team member with Participatory Program development Contribution to team
evaluation perspective Questioning Insightfulness
Challenging Ability to communicate
evaluation perspective
Flexibility
Analytical leadership
7. Program staff and Collaborator Involved Action research and evaluation Accepting of others
participants Supportive on groups' own issues; Mutual respect
Encouraging participatory evaluation Communication skills
Enthusiasm
Perceived genuineness of
collaborative approach
Program participants/ Empowerment facilitator Resource person Participant self-determination; Mutual respect
community members pursuit of political agenda Participation
Engagement
Enabling skills
Political savvy
9. Ideological adherents Supporter of cause Co-leader Social justice Engagement

Committed Commitment
Political expertise
Knowledge of "the system"
Integrity
Values
10. Future evaluation Synthesizer Analytical Synthesize findings from Professionalism

planners and users Meta-evaluator multiple evaluations Analytical insightfulness
Cluster leader Judge quality of evaluations Conceptual brilliance
Integrity
Adherence to standards
to involve users, political sensitivity, and dealing with only one primary decision
credibility maker at the outset, and suddenly you have
2. User characteristics, such as interest in the stakeholders coming out your ears, or vice
evaluation, willingness to commit time and versa. With some programs, I've felt like
energy, and position of influence I've been through all 8,000 situations in the
3. Contextual characteristics, such as size of first month.
organization, political climate, and existence And, in case 8,000 situations to analyze,
of competing information be sensitive to, and design evaluations for
4. Evaluation characteristics, such as nature doesn't seem challenging enough, just add
and timing of the evaluation report, rele- two more points to each dimension—a
vance of evaluation information, and quality point between each endpoint and the mid-
of the data and evaluation point. Now, combinations of the five points
on all 20 dimensions yield 3,200,000 p o -
M e n u 6.3 offers examples of situations tentially different situations. Perhaps such
that pose special challenges to evaluation complexity helps explain why the slogan
use and the evaluator's role. that w o n the hearts of evaluators in atten-
Exhibit 6.1 (pages 132-3) is a look at a dance at the 1978 Evaluation N e t w o r k
few of the many situational variables an conference in Aspen, Colorado, was Jim
evaluator may need to be aware of and take Hennes's lament:
into account in conducting a utilization-
Evaluators do IT
focused, feasibility-conscious, propriety-
under difficult circumstances.
oriented, and accuracy-based evaluation.
The situational variables in Exhibit 6.1 Of course, one could make the same
are presented in no particular order. Most analysis for virtually any area of decision
of them could be broken down into several making, couldn't one? Life is complex, so
additional variables. I have n o intention of what's new? First, let's look at what's old.
trying to operationalize these dimensions The evidence from social and behavioral
(that is, make them clear, specific, and science is that in other areas of decision
measurable). The point of presenting them making, when faced with complex choices
is simply to emphasize and reiterate this: and multiple situations, we fall back on a
Situational evaluation means that evalua- set of rules and standard operating proce-
tors have to be prepared to deal with a lot dures that predetermine what we will do,
of different people and situations. If we that effectively short-circuit situational
conceive of just three points (or situations) adaptability. The evidence is that we are
on each of these dimensions—the two end- running most of the time on prepro-
points and a midpoint—then the combina- grammed tapes. T h a t has always been the
tions of these 2 0 dimensions represent function of rules of t h u m b and scientific
8,000 unique evaluation situations. paradigms. Faced with a new situation, the
N o r are these static situations. The pro- evaluation researcher (unconsciously)
gram you thought was new at the first turns to old and comfortable patterns. This
session turns out to have been created out may help explain why so many evaluators
of and to be a continuation of another w h o have rhetorically embraced the phi-
program; only the name has been changed losophy of situational evaluation find that
to protect the guilty. You thought you were the approaches in which they are trained
MENU 6.3
r Pose Special Challenges to Evaluation Use and

Examples of Situations Th;
the Evaluator's Role
Special Evaluator
Situation Challenge Skills Needed
1. Highly controversial Facilitating different points Conflict resolution skills

issue of view
2. Highly visible program Dealing with publicity about Public presentation skills
the program; reporting Graphic skills
findings in a media-circus Media handling skills
atmosphere
3. Highly volatile program Rapid change in context, Tolerance for ambiguity

environment issues, and focus Rapid responsiveness
Flexibility
Being a "quick study"
4. Cross-cultural or Including different Cross-culture sensitivity

international perspectives, values Skills in understanding and
Being aware of cultural incorporating different
blinders and biases perspectives
5. Team effort Managing people Identifying and using individ-

ual skills of team members;
team-building skills
6. Evaluation attacked Preserving credibility Calm; staying focused on

evidence and conclusions
7. Corrupt program Resolving ethical issues/ Integrity

upholding standards Clear ethical sense
Honesty
V J
rtable
and with which they are most comfortable adaptation to changed and changing condi-
just happen to be particularly appropriate
priate tions, as opposed to a technical approach,
in each new evaluation situation they' con- which attempts to mold and define condi-
front—time after time after time. Sociolo-
ciolo- tions to fit preconceived models of h o w
gists just happen to find doing a survey
urvey things should be done. Utilization-focused
appropriate. Economists just happen to :o feel evaluation involves overcoming what
the situation needs cost-benefit analysis.
alysis. Brightman and Noble (1979) have identi-
Psychologists study the situation and decide
lecide fied as "the ineffective education of deci-
that—surprise!—testing would be appropri-
ropri- sion scientists." They portray the typical
ate. And so it goes. decision scientist (a generic term for
Utilization-focused evaluation is a prob- evaluators, policy analysts, planners, and so
eative
lem-solving approach that calls for creative on) as follows:
EXHIBIT 6.1
Examples of Situational Factors in Evaluation
That Can Affect Users' Participation and Use
One primary decision maker Large number

1. Number of stakeholders to be
dealt with
Formative purpose Summative purpose
(improvement) 2. Purpose of the evaluation (funding decision)
New program Long history

3. History of the program
Enthusiasm Resistance
4. Staff attitude toward evaluation
Knows virtually nothing Highly knowledgeable

5. Staff knowledge about evaluation
Cooperative Conflict-laden
6. Program interaction patterns
(administration-staff, staff-staff,
staff-client)
First time ever Seemingly endless

7. Program's prior evaluation experience
experience
High Low
8. Staff and participants education
levels
Homogeneous groups Heterogeneous groups

9. Staff and/or participants'
characteristics
(pick any 10 you want)
hopelessly naive and intellectually arrogant. ing technocrats to deal with structured prob-
Naive because they believe that problem lems than problem solvers to deal with
solving begins and ends with analysis, and ill-structured ones. (p. 150)
arrogant because they opt for mathematical
rigor over results. They are products of their Narrow technocratic approaches em-
training. Decision science departments ap- phasize following rules and standard op-
pear to have been more effective at train- erating procedures. Creative problem-
One site Multiple sites

10. Program location
No money to speak of Substantial funding

11. Resources available for
evaluation
One funding source Multiple funding sources

12. Number of sources of program
funding
Simple and unidimensional Complex and

13. Nature of the program treatment multidimensional
Highly standardized and Highly individualized

routine 14. Standardization of treatment and nonroutine
Horizontal, little hierarchy, Hierarchical, long chain

little stratification 15. Program organizational of command, stratified
decision-making structure
Well-articulated, specifically Ambiguous, broadly

defined 16. Clarity about evaluation defined
purpose and function
Operating information No existing data

system 17. Existing data on program
External Internal
18. Evaluator(s)'relationship to
the program
Voluntary, self-initiated Required, forced on

19. Impetus for the evaluation program
Long time line, open Short time line, fixed

20. Time available for the deadline
evaluation
solving approaches, in contrast, focus on processed foods of the national supermar-

what works and what makes sense in the ket chains, with the attendant risks of both
situation. Standard methods recipe books greater failure and greater achievement.
aren't ignored. They just aren't taken as Lawrence Lynn's (1980a) ideal policy
the final word. New ingredients are added analyst bears a striking resemblance to my
to fit particular tastes. Homegrown or idea of a creative and responsive utilization-
locally available ingredients replace the focused evaluator.
Individuals really do have to be interdisci- situations or scripts or intuitive sensibilities

plinary; they have to be highly catholic in and understandings about how these situ-
their taste for intellectual concepts and ideas ations will largely unfold. Simon estimates a
and tools. I do not think we are talking so differential repertoire of 50,000 situation
much about acquiring a specific kind of recognitions at the world-class chess level.
knowledge or a specialist's knowledge in There is also some increase in overall long-
order to deal with environmental issues or range strategic planning ability—beginners
energy issues. One does not have to know typically are hard pressed to go beyond one
what a petroleum engineer knows, or what move deep; world-class players often antici-
an air quality engineer knows, or what a pate 3 or sometimes 5 future moves in
chemist knows. Rather, one simply has to be calculating alternative reactions to their
able to ask questions of many disciplines and moves.... One further learning is the capac-
many professions and know how to use the ity to diagnose not just specific game
information. And what that says is, I think, situations but to model or "psyche out" dif-
one has to be intellectually quite versatile. ferent opponents. (Etheredge 1980:243)
It is not enough to be an economist. . . ,
an operations research specialist, [or] a stat- I suggest there's a parallel here to an-
istician. One has to be a little bit of all of ticipating potential use and knowing how
those things. One has to have an intuitive to facilitate it. Etheredge (1980) also
grasp of an awful lot of different intellectual found that experienced players develop
approaches, different intellectual disciplines efficient scanning techniques and the abil-
or traditions so that one can range widely in ity to discard unnecessary information.
doing one's job of crafting a good analysis, They cultivate what seems like an intuitive
so that you are not stuck with just the tools sense but is really a practiced sense of
you know. I think, then, the implication is where to devote attention. You will be
versatility and an intuitive grasp of a fairly hard-pressed, in my view, to find a better
wide range of different kinds of skills and description of evaluation expertise in
approaches, (p. 88; emphasis added) working with intended users. Effective fa-
cilitation involves situation recognition
and responsiveness, anticipation, and the
Learning to Be ability to analyze people—knowing where,
when, and h o w to focus attention. These
Situationally Responsive
are learned and practiced behaviors, a
view I assert w h e n someone suggests that
Expert evaluators are sophisticated at
utilization-focused evaluation only works
situation recognition. Such expertise does
for certain personality types.
not develop overnight, nor is it an outcome
of training. Expertise comes from practice.
Consider expertise in chess as a metaphor
for developing situationally responsive ex- Being Active-Reactive-Adaptive
pertise in evaluation.
In the title of this chapter, I used the
It takes at least IS years of hard work for phrase "active-reactive-adaptive" to sug-
even the most talented individuals to become gest the nature of the consultative interac-
world-class chess masters: what they seem to tions that go on between evaluators and
learn is a repertoire for recognizing types of intended users. The phrase is meant to be
both descriptive and prescriptive. It de- time after time. They are genuinely im-
scribes how real-world decision making mersed in the challenges of each new set-
actually unfolds. Yet, it is prescriptive in ting and authentically responsive to the
alerting evaluators to consciously and de- intended users of each new evaluation.
liberately act, react, and adapt in order to It is the paradox of decision making that
increase their effectiveness in working with effective action is born of reaction. Only
stakeholders. when organizations and people take in in-
Utilization-focused evaluators are, first formation from the environment and react
of all, active in deliberately and calculat- to changing conditions can they act on that
edly identifying intended users and focus- same environment to reduce uncertainty
ing useful questions. They are reactive in and increase discretionary flexibility (see
listening to intended users and responding Thompson 1967). The same is true for the
to what they learn about the particular individual decision maker or for a problem-
situation in which the evaluation unfolds. solving group. Action emerges through
They are adaptive in altering evaluation reaction and leads to adaptation. The im-
questions and designs in light of their in- agery is familiar: thesis-antithesis-synthesis;
creased understanding of the situation stimulus-response-change.
and changing conditions. Active-reactive- This active-reactive-adaptive stance char-
adaptive evaluators don't impose cook- acterizes all phases of evaluator-user inter-
book designs. They don't do the same thing actions, from initially identifying primary
intended users to focusing relevant ques- fessional experience. All of the techniques
tions, choosing methods, and analyzing re- and ideas presented in this book must be
sults. All phases involve collaborative pro- adapted to the style of the individuals using
cesses of action-reaction-adaptation as them.
evaluators and intended users consider Cousins, Donohue, and Bloom (1996)
their options. The menu of choices includes surveyed North American evaluators to
a broad range of methods, evaluation in- find out what variables correlated with a
gredients from bland to spicy, and a variety collaborative style of practice. Organiza-
of evaluator roles: collaborator, trainer, tional affiliation, gender, and primary job
group facilitator, technician, politician, or- responsibility did not differentiate practice
ganizational analyst, internal colleague, ex- and opinion responses. Canadian evalua-
ternal expert, methodologist, information tors reported greater depth of stakeholder
broker, communicator, change agent, dip- involvement than Americans. Most telling,
lomat, problem solver, and creative con- however, were years and depth of experi-
sultant. The roles played by an evaluator in ence with collaborative approaches. More
any given situation will depend on the experienced evaluators expected more use
evaluation's purpose, the unique constella- of their evaluations and reported a greater
tion of conditions with which the evaluator sense of satisfaction from the collaborative
is faced, and the evaluator's own personal process and greater impacts of the resulting
knowledge, skills, style, values, and ethics. evaluations. In essence, evaluators get bet-
The mandate to be active-reactive- ter at the active-reactive-adaptive process
adaptive in role-playing provokes protest the more they experience it; and the more
from those evaluators and intended users they use it, the more they like it and the
who advocate only one narrow role, more impact they believe it has.
namely, that the evaluator renders judg- Being active-reactive-adaptive explicitly
ment about merit or worth—nothing else recognizes the importance of the indi-
(Stufflebeam 1994; Scriven 1991a). vidual evaluator's experience, orientation,
Clearly, I have a more expansive view of an and contribution by placing the mandate to
evaluator's role possibilities and responsi- be active first in this consulting triangle.
bilities. Keeping in mind that the idea of Situational responsiveness does not mean
multiple evaluator roles is controversial, rolling over and playing dead (or passive)
let's turn to look at what the evaluator in the face of stakeholder interests or per-
brings to the utilization-focused negotiat- ceived needs. Just as the evaluator in utili-
ing table. zation-focused evaluation does not unilat-
erally impose a focus and set of methods on
a program, so, too, the stakeholders are not
Multiple Evaluator Roles set up to impose their initial predilections
and Individual Style unilaterally or dogmatically. Arriving at the
final evaluation design is a negotiated pro-
The evaluator as a person in his or her cess that allows the values and capabilities
own right is a key part of the situational of the evaluator to intermingle with those
mix. Each evaluation will be unique in part of intended users.
because individual evaluators are unique. The utilization-focused evaluator, in be-
Evaluators bring to the negotiating table ing active-reactive-adaptive, is one among
their own style, personal history, and pro- many at the negotiating table. At times
there may be discord in the negotiating reflect the competing politics and values of
process; at other times harmony. Whatever the setting, (p. 273)
the sounds, and whatever the themes, the
utilization-focused evaluator does not sing She then recommended that evaluators
alone. He or she is part of a choir made up "explicate the politics and values" that un-
of primary intended users. There are solo dergird decisions about purpose, audience,
parts, to be sure, but the climatic theme design, and methods. Her recommenda-
song of utilization-focused evaluation is tion is consistent with utilization-focused
not Frank Sinatra's "I Did It My Way." evaluation.
Rather, it's the full chorus joining in a
unique, situationally specific rendition of
"We Did It Our Way." Respect for Intended Users
One central value that should undergird

User Responsiveness the evaluator's active-reactive-adaptive
and Technical Quality role is respect for all those with a stake in
a program or evaluation. In their seminal
User responsiveness should not mean a article on evaluation use, Davis and Salasin
sacrifice of technical quality. Later chapters (1975) asserted that evaluators were in-
will discuss in detail the utilization-focused volved inevitably in facilitating change and
! approach to ensuring technical quality. A that "any change model should . . . gener-
beginning point is to recognize that stan- ally accommodate rather than manipulate
dards of technical quality vary for different the view of the persons involved" (p. 652).
users and varying situations. The issue is Respectful utilization-focused evaluators
not meeting some absolute research stan- do not use their expertise to intimidate or
dards of technical quality but, rather, mak- manipulate intended users. Egon Guba
ing sure that methods and measures are (1977) has described in powerful language
appropriate to the validity and credibility an archetype that is the antithesis of the
needs of a particular evaluation purpose utilization-focused evaluator:
and specific intended users.
Jennifer Greene (1990) examined in It is my experience that evaluators some-
depth the debate about technical quality times adopt a very supercilious attitude with
versus user responsiveness. She found gen- respect to their clients; their presumptuous-
eral agreement that both are important but ness and arrogance are sometimes over-
disagreement about the relative priority of whelming. We treat the client as a "child-
each. She concluded that the debate is re- like" person who needs to be taken in hand;
ally about how much to recognize and deal as an ignoramus who cannot possibly under-
with evaluation's political inherency: stand the tactics and strategies that we will
bring to bear; as someone who doesn't ap-
Evaluators should recognize that tension and preciate the questions he ought to ask until
conflict in evaluation practice are virtually we tell him—and what we tell him often
inevitable, that the demands imposed by reflects our own biases and interests rather
most if not all definitions of responsiveness than the problems with which the client is
and technical quality (not to mention feasi- actually beset. The phrase "Ugly American"
bility and propriety) will characteristically has emerged in international settings to de-
scribe the person who enters into a new independent, objective, and credible than
culture, immediately knows what is wrong internal evaluators. Internal evaluations
with it, and proceeds to foist his own solu- are suspect because, it is presumed, they
tions onto the locals. In some ways I have can be manipulated more easily by admin-
come to think of evaluators as "Ugly Ameri- istrators to justify decisions or pressured to
cans." And if what we are looking for are present positive findings for public rela-
ways to manipulate clients so that they will tions purposes (House 1986). Of course,
fall in with our wishes and cease to resist our external evaluators who want future evalu-
blandishments, I for one will have none of it. ation contracts are also subject to pressure
(p. 1; emphasis in original) to produce positive findings. In addition,
external evaluators are also typically more
For others who "will have none of it," costly, less knowledgeable about the nu-
there is the alternative of undertaking a ances and particulars of the local situation,
utilization-focused evaluation process and less able to follow through to facilitate
based on mutual respect between evalua- the implementation of recommendations.
tors and intended users. When external evaluators complete their
contract, they may take with them a great
deal of knowledge and insight that is lost
to the program. That knowledge stays "in-
Internal and External Evaluators house" with internal evaluators. External
evaluators have also been known to cause
One of the most fundamental issues in difficulties in a program through insensitiv-
considering the role of the evaluator is the ity to organizational relationships and
location of the evaluator inside or outside norms, one of the reasons the work of
the program and organization being evalu- external evaluators is sometimes called
ated, what has sometimes been called the "outhouse" work.
"in-house" versus "outhouse" issue. The
early evaluation literature was aimed pri- One of the major trends in evaluation
marily at external evaluators, typically during the 1980s was a transition from
researchers who conducted evaluations un- external to internal evaluation, with Cana-
der contract to funders. External evalua- dian Arnold Love (1991, 1983) document-
tors come from universities, consulting ing and contributing to the development of
firms, and research organizations or work internal evaluation. At the beginning of the
as independent consultants. The defining 1970s evaluation was just emerging as a
characteristic of external evaluators is that profession. There were fewer distinct
they have no long-term, ongoing position evaluation units within government bu-
within the program or organization being reaus, human service agencies, and private
evaluated. They are therefore not subordi- sector organizations than there are now.
nated to someone in the organization and School districts had research and evalu-
not directly dependent on the organization ation units, but even they contracted out
for their job and career. much of the evaluation work mandated
External evaluators are valuable pre- by the landmark 1965 Elementary and Sec-
cisely because they are outside the organi- ondary Education Act in the United States.
zation. It is typically assumed that their As evaluation became more pervasive in the
external status permits them to be more 1970s, as the mandate for evaluation was
added to more and more legislation, and as participation in the evaluation process.
training for evaluators became more avail- One internal evaluator commented,
able and widespread, internal evaluation
units became more common. Now, most
federal, state, and local agencies have in- My director told me he doesn't want to
ternal evaluation units; international or- spend time thinking about evaluation. That's
ganizations also have internal evaluation why he hired me. He wants me to "anticipate
divisions; and it is clear that "internal his information needs." I've had to find ways
evaluators can produce evaluations of high to talk with him about his interests and in-
quality that meet rigorous standards of formation needs without explicitly telling
objectivity while still performing useful him he's helping me focus the evaluation. I
service to administrators if they have pre- guess you could say I kind of involve him
viously established an image of an inde- without his really knowing he's involved.
pendent but active voice in the organiza-
tional structure" (Sonnichsen 1987: 2. Internal evaluators are often asked
34-35). by superiors for public relations informa-
Over the years, I have had extensive tion rather than evaluation. The internal
contact with internal evaluators through evaluator may be told, "I want a report for
training and consulting, working closely the legislature proving our program is ef-
with several of them to design internal fective." It takes clear conviction, subtle
monitoring and evaluation systems. For the diplomacy, and an astute understanding of
second edition of this book, I interviewed how to help superiors appreciate evalu-
10 internal evaluators who I knew used a ation to keep internal evaluation responsi-
utilization-focused approach. Their com- bilities from degenerating into public rela-
ments about how they have applied utiliza- tions. One mechanism used by several
tion-focused principles offer insights into internal evaluators to increase support for
the world of the internal evaluator and real evaluation rather than public relations
illuminate research findings about effec- is establishing an evaluation advisory com-
tive approaches to internal evaluation mittee, including influential people from
(Winberg 1991; Lyon 1989; Huberty outside the organization, to provide inde-
1988; Kennedy 1983). pendent checks on the integrity of internal
evaluations.
Themes From Internal Evaluators 3. Internal evaluators get asked to do

lots of little data-gathering and report-
1. Actively involving stakeholders writing tasks that are quite time consuming
within the organization can be difficult but too minor to be considered meaningful
because evaluation is often perceived by evaluation. For example, if someone in the
both superiors and subordinates as the job agency wants a quick review of what other
of the evaluator. The internal evaluator is states are doing about some problem, the
typically expected to do evaluations, not internal evaluator is an easy target for the
facilitate an evaluation process involving task. Such assignments can become so per-
others. Internal evaluators who have had vasive that it's difficult to have time for
success involving others have had to work longer-term, more meaningful evaluation
hard at finding special incentives to attract efforts.
4. Internal evaluators are often ex- evaluation and some of them didn't like it at
cluded from major decisions or so far re- all, although a couple of the good ones said
moved from critical information networks they were glad I reminded them.
that they d o n ' t k n o w about new initiatives
or developments in time to build in an Another internal evaluator told me he had
evaluation perspective up front. O n e inter- learned h o w to follow u p informally. He
nal evaluator explained, has seven years' experience as an internal
h u m a n services evaluator. H e said,
We have separate people doing planning and
evaluation. I'm not included in the planning At first I just wrote a report and figured my
process and usually don't even see the plan job was done. Now, I tell them when we
until it's approved. Then they expect me to review the initial report that I'll check back
add on an evaluation. It's a real bitch to take in a few months to see how things are going.
a plan done without any thought of evalu- I find I have to keep pushing, keep remind-
ation and add an evaluation without ing, or they get busy and just file the report.
screwing up or changing the plan. They think We're gradually getting some understanding
evaluation is something you do at the end that our job should include some follow-up.
rather than think about from the start. It's Mostly it's on a few things that we decide are
damn hard to break through these percep- really important. You can't do it all.
tions. Besides, I don't want to do the
planners' job, and they don't want to do my
job, but we've got to find better ways of Internal Role Definitions
making the whole thing work together.
That's my frustration. . . . It takes me con- The themes from internal evaluators in-
stantly bugging them, and sometimes they dicate the importance of carefully defining
think I'm encroaching on their turf. Some the job to include attention to use. W h e n
days I think, "Who needs the hassle?" even and if the internal evaluation job is defined
though I know it's not as useful just to tack primarily as writing a report and filling out
on the evaluation at the end. routine reporting forms, the ability of the
evaluator to influence use is quite limited.
5. Getting evaluation used takes a lot of W h e n and if the internal evaluator is or-
follow-through. O n e internal evaluator ex- ganizationally separated from managers
plained that her job was defined as data and planners, it is difficult to establish col-
gathering and report writing without con- laborative arrangements that facilitate use.
sideration of following up to see if report Thus, a utilization-focused approach to in-
recommendations were adopted. That's ternal evaluation will often require a re-
not part of her job description, and it takes definition of the position to include re-
time and some authority. She commented, sponsibility for working with intended
users to develop strategies for acting on
How do I get managers to use a report if my findings.
job is just to write the report? But they're O n e of the most effective internal evalu-
above me. I don't have the authority to ask ation units I've encountered was in the U.S.
them in six months what they've done. I Federal Bureau of Investigation (FBI). This
wrote a follow-up memo once reminding unit reported directly to the bureau's dep-
managers about recommendations in an uty director. The evaluation unit director
had direct access to the director of the FBI becomes a tool for change and a vehicle for
in both problem identification and discus- evaluators to influence the organization,
sion of findings. The purpose of the unit (p. 141; emphasis in original)
was program improvement. Reports were
written only for internal use; there was no An evaluation advocate is not a cheer-
public relations use of reports. Public rela- leader for the program, but rather, a
tions was the function of a different unit. champion of evaluation use. This is some-
The internal evaluation staff was drawn times called persuasive use in which
from experienced FBI agents. They thus "advocates work actively in the politics of
had high credibility with agents in the field. the organization to get results used"
They also had the status and authority of (Caracelli and Preskill 1996).
the director's office behind them. The
evaluation unit had an operations hand-
book that clearly delineated responsibilities The new evaluator is a program advocate—
and procedures. Evaluation proposals and not an advocate in the sense of an ideologue
designs were planned and reviewed with willing to manipulate data and to alter find-
intended users. Multiple methods were ings to secure next year's funding. The new
used. Reports were written with use in evaluator is someone who believes in and is
mind. Six months after the report had been interested in helping programs and organiza-
written and reviewed, follow-up was for- tions succeed. At times the program advo-
mally undertaken to find out if recommen- cate evaluator will play the traditional critic
dations had been implemented. The inter- role: challenging basic program assump-
nal evaluators had a strong commitment to tions, reporting lackluster performance, or
improving FBI programs and clear author- identifying inefficiencies. The difference,
ity to plan, conduct, and report evaluations however, is that criticism is not the end of
in ways that would have an impact on the performance-oriented evaluation; rather, it
organization, including follow-up to make is part of a larger process of program and
sure recommendations approved by the di- organizational improvement, a process that
rector were actually implemented. receives as much of the evaluator's atten-
Based on his experience directing the tion and talents as the criticism function.
FBI's internal evaluation unit, DickSon- (Bellavita et al. 1986:289)
nichsen (1988) formulated what he has
called internal advocacy evaluation as a The roles of champion, advocate, and
style of organizational development: change agent (Sonnichsen 1994) are just
some of the many roles open to internal
Internal evaluators have to view themselves evaluators. Love (1991) has identified a
as change agents and participants in policy number of both successful and unsuccess-
formulation, migrating from the traditional ful roles for internal evaluators (see Ex-
position of neutrality to an activist role in the hibit 6.2). Carefully defining the role of
organizational decision-making process. The internal evaluator is a key to effective and
practice of Advocacy Evaluation positions credible internal evaluation use.
internal evaluators to become active partici- Part of defining the role is labeling it in
pants in developing and implementing a meaningful way. Consider this reflection
organizational improvements. Operating from internal school district evaluator
under an advocacy philosophy, evaluation Nancy Law (1996), whose office was re-
EXHIBIT 6.2
Successful and Unsuccessful Roles of Internal Evaluators
Successful Roles Unsuccessful Roles
Management consultant Spy

Decision support Hatchet carrier
Management information resource Fear-inspiring dragon
Systems generalist Number cruncher
Expert troubleshooter Organizational conscience
Advocate for/champion of evaluation use Organizational memory
Systematic planner Public relations officer
SOURCE: Adapted and expanded from Love 1991:9.
named from Research and Evaluation to differences along the lines of the preceding
Accountability Department. discussion, I like to point out that the ques-
tion is loaded by implying that internal and
This title change has become meaningful for external approaches are mutually exclu-
me personally. It was easy to adopt the name, sive. Actually, there are a good many possi-
but harder to live up to it. . . . ble combinations of internal and external
Now, I am learning that / am the one evaluations that may be more desirable and
accountable—that my job doesn't end when more cost-effective than either a purely
the report is finished and presented. My role internal or purely external evaluation.
continues as I work with others outside re- Accreditation processes are a good ex-
search to create changes that I have recom- ample of an internal-external combination.
mended. Oh, yes, we still perform the types The internal group collects the data and
of research/evaluation tasks done previously, arranges them so that the external group
but there is a greater task still to be done— can come in, inspect the data collected by
that of convincing those who need to make the internal group, sometimes collect addi-
changes to move ahead! Put simply, when we tional information on their own, and pass
took a different name, we became something judgment on the program.
different and better, (p. 1) There are many ways in which an evalu-
ation can be set up so that some external
group of respected professionals and
Internal-External evaluators guarantees the validity and fair-
Evaluation Combinations ness of the evaluation process while the
people internal to the program actually
In workshops, I am often asked to com- collect and/or analyze the evaluation data.
pare the relative advantages and disadvan- T h e cost savings of such an approach can
tages of internal versus external evalu- be substantial while still allowing the evalu-
ations. After describing some of the ation to have basic credibility and legiti-
macy through the blessing of the external When orchestrating an internal-external

review committee. combination, one danger to watch for is
I worked for several years with one of that the external group may impose un-
the leading chemical dependency treat- manageable and overwhelming data collec-
ment centers in the country, the Hazelden tion procedures on the internal people. I
Foundation of Minnesota. The foundation saw this happen in an internal-external
has established a rigorous evaluation pro- model with a group of school districts in
cess that involves data collection at the Canada. The external committee set as the
point of entry into the program and then standard doing "comprehensive" data col-
follow-up questionnaires 6 months, 12 lection at the local school level, including
months, and 24 months after leaving the data on learning outcomes, staff morale,
program. Hazelden's own research and facilities, curriculum, the school lunch pro-
evaluation department collects all of the gram, the library, parent reactions, the per-
data. My responsibility as an external ceptions of local businesspeople, analy-
evaluator was to monitor that data collec- sis of the school bus system, and so on.
tion periodically to make sure that the es- After listening to all of the things the exter-
tablished procedures were being followed nal committee thought should be done, the
correctly. I then worked with the program internal folks dubbed it the Internal-
Extemal-Eternal model of evaluation.
decision makers to identify the kind of data
analysis that was desirable. They per- The point is that a variety of internal-
formed the data analysis with their own external combinations are possible to com-
computer resources. They sent the data to bine the lower costs of internal data col-
me, and I wrote the annual evaluation re- lection with the higher credibility of exter-
port. They participated in analyzing, internal review. In working out the details of
preting, and making judgments about the internal-external combinations, care will
data, but for purposes of legitimacy and need to be taken to achieve an appropriate
credibility, the actual writing of the final and mutually rewarding balance based on
report was done by me. a collaborative commitment to the stan-
This internal/external combination is dards of utility, feasibility, propriety, and
I sometimes extended one step further by accuracy.
having still another layer of external pro-
fessionals and evaluators pass judgment on
the quality and accuracy of the evaluation Evaluation as Results-
final report through a meta-evaluation Oriented Leadership
process—evaluating the evaluation based
on the profession's standards and princi- Most writings about internal evaluation
ples. Indeed, the revised standards for assume a separate unit or specialized posi-
evaluation (Joint Committee 1994:A12) tion with responsibility to conduct evalu-
prescribe meta-evaluation so that stake- ations. An important new direction in
holders have an independent credible re- evaluation is to treat evaluation as a lead-
view of an evaluation's strengths and weak- ership function of all managers and pro-
nesses. Such an effort will be most gram directors in the organization. The
meaningful and cost-beneficial for large- person responsible for internal evaluation
scale summative evaluations of major pol- then plays a facilitative, resource, and train-
icy importance. ing function in support of managers rather
EXHIBIT 6.3
Four Functions of Results-Oriented,
Reality-Testing Leadership
• Create and nurture a results-oriented, reality-testing culture.

• Lead in deciding what outcomes to commit to and hold yourselves accountable for.
• Make measurement of outcomes thoughtful, meaningful, and credible.
• Use the results—and model for others serious use of results.
than spending time actually conducting ation for decision making and budgeting;
evaluations. The best example of this ap- and (3) because of the prior two commit-
proach I've worked with and observed up ments, a person of great competence and
close was the position of Associate Admin- dedication was selected to fill the Associate
istrator for Performance Measurement and Administrator Performance Measurement
Evaluation in Hennepin County, Minne- and Evaluation position, after a national
sota (Minneapolis). The county had no in- search.
ternal evaluation office. Rather, this senior These patterns of effectiveness stand out
position, as part of the County Executive because so often internal evaluation is dele-
team, had responsibility to infuse evalu- gated to the lowest level in an organization
ation systems throughout the county, in and treated as a clerical function. Indeed,
every department and program. The being given an evaluation assignment is
framework called for a results-oriented ap- often a form of punishment agency direc-
proach that was "thoughtful, useful, and tors use, or a way of giving deadwood staff
credible" (Knapp 1995). Every manager in something meaningless to occupy them-
the county received training in how to build selves with. It is clear that, for internal
outcomes evaluation into ongoing program evaluators to be useful and credible, they
processes. Performance measurement was must have high status in the organiza-
tied to reporting and budgeting systems. tion and real power to make evaluation
What made this approach to internal evalu- meaningful.
ation work, in my judgment, was three- Elevating the status of evaluation in this
fold: (1) Results-oriented performance way is most likely to occur when evaluation
measurement was defined as a leadership is conceived of as a leadership function
function of every county manager, not just rather than a low-level clerical or data man-
a reporting function to be delegated as far agement task. Exhibit 6.3 presents the four
down as possible in the department; (2) the functions of results-oriented leadership. In
overall responsibility for evaluation resided this framework, evaluation becomes a sen-
at the highest level of the organization, in ior management responsibility focused on
the executive team, with direct access to the decision-oriented use rather than a data-
County Board of Commissioners backed collection task focused on routine internal
up by public commitments to use evalu- reporting.
There is a downside to elevating the focused evaluation is turnover of primary

status and visibility of internal evaluators: intended users.
They become more politically vulnerable.
To complete the example cited above, when
the county administrator of Hennepin Going Bananas
County departed, the Associate Adminis-
trator for Performance Measurement and Before closing this chapter, it seems ap-
Evaluation position was terminated. Politi- propriate to provide a situation for situ-
cally, the position was dependent on the ational analysis, a bit of a practice exercise,
county's chief executive officer. When that if you will. Consider, then, the evaluation
person left, the internal evaluation position relevance of the following story repeated
became expendable as part of the sub- from generation to generation by school-
sequent political shakeout. As Chapter 15 children.
will discuss, the Achilles' heel of utilization-
A nun u'Jlking along the street notices another man on the other side with bananas
in /'/> VMS. I Ic shouts, "Hey, mister, why do you have bananas in your ears?" Receiving
'in ti-'l'unsv. '>e pursues the man, calling again as he approaches, "Pardon me, but why
have you got bananas in your ears?" Again there is no response.
He catches up to the man, puts his hand on his shoulder, and says "Do you realize
you have bananas in your ears?"
The gentleman in question stops, looks puzzled, takes the bananas out of his ears,
and says, "I'm sorry, what did you ask? I couldn't hear you because I have bananas in
my ears."
Now for the situational analysis. How 2. If the banana man is an evaluator, and the
might you use this story with a group of man in pursuit is a stakeholder
intended users to make a point about 3. If both are primary stakeholders and the
the nature of evaluation? What point(s) evaluator observes this scene
could the story be used to illustrate (meta- 4. Both are evaluators observed by a stake-
phorically)? holder
What are the implications of the story for
evaluation under four different conditions: It is just such situational variations that
make strategic, contingency thinking and
1. If the man with the bananas in his ears is a evaluator responsiveness so important —
stakeholder and the man in pursuit is an and so challenging.
evaluator
.4?
Beyond the Goals Clarification Game

Focusing on Outcomes
M ulla Nasrudin was a Sufi guru. A king who enjoyed Nasrudin's company,
and also liked to hunt, commanded him to accompany him on a bear hunt.
Nasruain was terrified.
When Nasrudin returned to his village, someone asked him: "How did the Hunt go?"
"Marvelously!"
"How many bears did you see?"
"None."
"How could it have gone marvelously, then?"
"When you are hunting bears, and you are me, seeing no bears at all is a marvelous
experience."
—Shah 1964:61
Evaluation of the Bear Project
If this tale were updated by means of an evaluation report, it might read something like
this:
Under the auspices of His Majesty's Ministry of the Interior, Department of Natural
Resources, Section on Hunting, Office of Bears, field observers studied the relationship
between the number of bears sighted on a hunt and the number of bears shot on a hunt.
Having hypothesized a direct, linear relationship between the sighting of bears and
147
killing of bears, data were collected on a recent royal hunting expedition. The small
sample size limitsgeneralizability, but the results support the hypothesis at the 0.001
level of statistical significance. Indeed, the correlation is perfect. The number of bears
sighted was zero and the number killed was zero. In no case was a bear killed without
first being sighted. We therefore recommend new Royal regulations requiring that bears
first be sighted before they are killed.
Respectfully submitted.
The Incomparable Mulla Nasrudin
Royal Evaluator
Whose Goals Will Be Evaluated? and the king, or reducing Nasrudin's fear
of bears, or an increase in the king's power
Although Nasrudin's evaluation bares over Nasrudin. It may even be possible
certain flaws, it shares one major trait with (likely!) that different participants in the
almost all other reports of this genre: hunt had different goals. Nasrudin per-
Namely, it is impossible to tell whether it ceived a "marvelous" outcome. Other
answers anyone's question. Who decided stakeholders, with different goals, might
that the goal evaluated should be the num- have concluded otherwise.
ber of bears killed? Perhaps the hunt's pur- In utilization-focused evaluation, the
pose was a heightened sensitivity to nature, primary intended users determine whose
or a closer relationship between Nasrudin goals will be evaluated if they decide that
Beyond the Goals Clarification Game • 149
evaluating goal attainment will be the focus bling program staff to identify and clarify
of the evaluation. There are other ways of program goals and objectives. If evalua-
focusing an evaluation, as we'll see, but tors are second only to tax collectors in
first, let's review the traditional centrality the hearts of program staff, I suspect that
of goal attainment in evaluation. it is not because staff fear evaluators' judg-
ments about program success, but because
they hate constant questioning about
The Centrality of goals.
Goals in Evaluation
Traditionally, evaluation has been syn-

onymous with measuring goal attainment The Goals Clarification Game
(Morris and Fitz-Gibbon 1978). Peter Rossi
(1972) has stated that "a social welfare Evaluators frequently conduct goals
program (or for that matter any program) clarification meetings like the Twenty
which does not have clearly specified goals Questions game played at parties. Someone
cannot be evaluated without specifying thinks of an object in the room and then the
some measurable goals. This statement is players are allowed 20 questions to guess
obvious enough to be a truism" (p. 18). In what it is. In the goals clarification game,
a major review of the evaluation literature the evaluator has an object in mind (a clear,
in education, Worthen and Sanders (1973) specific, and measurable goal). Program
concluded that "if evaluators agree in any- staff are the players. The game begins with
thing, it is that program objectives written the staff generating some statement they
in unambiguous terms are useful informa- think is a goal. The evaluator scrutinizes the
tion for any evaluation study" (p. 231). statement for clarity, specificity, and mea-
Carol Weiss (1972b) observed that surability, usually judging the staff's effort
inadequate. This process is repeated in suc-
the traditional formulation of the evaluation cessive tries until the game ends in one of
question is: To what extent is the program three ways: (1) The staff gives up (so the
succeeding in reaching its goals? . . . The evaluator wins and writes the program
goal must be clear so that the evaluator goals for staff); (2) the evaluator gives up
knows what to look for. . . . Thus begins the (so the staff gets by with vague, fuzzy, and
long, often painful process of getting people unmeasurable goals); or (3) in rare cases,
to state goals in terms that are clear, specific, the game ends when staff actually stumbles
and measurable, (pp. 74-76; emphasis in on a statement that reasonably approxi-
original) mates what the evaluator had in mind.
Why do program staff typically hate this
As the preceding quotes illustrate, the game so much?
evaluation literature is replete with seri-
ous treatises on the centrality of program
goals, and this solemnity seems to carry 1. They have played the game hundreds of
over into evaluators' work with program times, not just for evaluators, but for funders
staff. There may be no more deadly way and advisory boards, in writing proposals,
to begin an evaluation effort than assem- and even among themselves.
2. They have learned that, when playing the gressional audiences, showing that Head
game with an evaluator, the evaluator almost Start's health, nutrition, resource redistri-
always wins. bution, cultural, and community goals
3. They come out of the game knowing that ought to have been in the spotlight (see
they appear fuzzy-minded and inept to the Evans 1971:402; Williams and Evans
evaluator. 1969). Thus, despite negative evaluation
4. It is a boring game. findings, Congress expanded the Head
5. It is an endless game because each new evalu- Start program, and the evaluators were
ator comes to the game with a different thrown on the defensive. (It was about this
object in mind. (Clarity, specificity, and same time that serious concerns over
measurability are not clear, specific, and nonuse of evaluation findings started to be
measurable criteria, so each evaluator can heard on a national scale.)
apply a different set of rules in the game.)
Among experienced program staff, Conflict Over Goals

evaluators may run into countering strate- and the Delphi Counter
gies like the goals clarification shuffle.
Like many dance steps (e.g., the Harlem Not all goals clarification exercises re-
shuffle, the hustle) this technique has the semble dances. Often, the more fitting
most grace and style when executed si- metaphor is war. Conflict over program
multaneously by a group. The goals clari- goals among different stakeholder groups
fication shuffle involves a sudden change is common. For example, in criminal justice
in goals and priorities after the evaluator programs, battles are waged over whether
has developed measuring instruments and the purpose of a program is punitive (pun-
a research design. The choreography is ish criminal offenders for wrongdoing),
dazzling. The top-priority program goal is custodial (keep criminal offenders off the
moved two spaces to either the right or streets), or rehabilitative (return offenders
left and four spaces backward. Concur- to society after treatment). In education
rently, all other goals are shuffled with and training programs, conflicts often
style and subtlety, the only stipulation emerge over whether the priority goal is
being that the first goal end up somewhere attitude change or behavior change. In wel-
in the middle, with other goals reordered fare agencies, disagreements can be found
by new criteria. over whether the primary purpose is to get
The goals clarification shuffle first came clients off welfare or out of poverty, and
into national prominence in 1969 when it whether the focus should be long-term
was employed as a daring counterthrust to change or short-term crisis intervention
the Westinghouse-Ohio State University (Conte 1996). In health settings, staff dis-
Head Start Evaluation. That study evalu- sension may emerge over the relative em-
ated cognitive and affective outcomes of phasis to be placed on preventive versus
the Head Start Program and concluded that curative medical practice. Chemical de-
Head Start was largely ineffective (Cicarelli pendency programs are often enmeshed in
1971; Westinghouse Learning Corporation controversy over whether the desired out-
1969). However, as soon as the final report come is sobriety or responsible use. Even
was published, the goals clarification shuf- police and fire departments can get caught
fle was executed before enthusiastic Con- in controversy about the purposes and ac-
tual effects of sirens, with critics arguing analysis—competing opinions apparently

that they're more a nuisance than a help converge and synthesize when this technique
(Perlman 1996). Virtually any time a group is used. (Rosenthal 1976:121)
of people assemble to determine program
goals, conflict can emerge, resulting in T h e trick to managing conflict with this
a lengthy, frustrating, and inconclusive technique is that the stakeholders never
meeting. meet face to face. T h u s , disagreements
For inexperienced evaluators, conflicts and arguments never get a chance to sur-
among stakeholders can be unnerving. face on an interpersonal level. Individual
Once, early in my career, a goals clarifica- responses remain confidential.
tion session erupted into physical violence
between a school board member and the The technique has proved so successful in
district's internal evaluator. The novice producing consensus . . . it is now often
evaluator can lose credibility by joining one adopted in many kinds of situations where
side or the other. M o r e experienced eval- convergence of opinion is advisable or desir-
uators have learned to remain calm and able . . . avoiding as it does the sundry prima
neutral, sometimes suggesting that multi- donna behaviors that may vitiate roundtable
ple goals be evaluated, thereby finessing discussions. (Rosenthal 1976:121-22)
the need for consensus about program
priorities. T h e strength of the Delphi a p p r o a c h —
A more elaborate counter to goals con- lack of face-to-face interaction—is also its
flict is the use of some kind of formal weakness. T h e process fails to deal with
ranking approach such as the D^lphTIech- real stakeholder power issues and diver-
nique (Dalkey 1 9 6 9 ; Helmer 1966), espe- gent interests. If those issues aren't dealt
cially where there are large numbers of with early in the evaluation, they will
stakeholders and many possible priorities. likely resurface later and threaten the
evaluation's credibility and utility.
The Delphi technique, a method of develop- In some instances, an evaluator may en-
ing and improving group consensus, was counter open warfare over goals and val-
originally used at the Rand Corporation to ues. A "goals w a r " usually occurs when two
arrive at reliable prediction about the future or more strong coalitions are locked in
of technology; hence its oracular name. . . . battle to determine which group will con-
Delphi essentially refers to a series of inten- trol the future direction of some public
sive interrogations of samples of individuals policy or program. Such wars involve
(most frequently, experts) by means of highly emotional issues and deeply held
mailed questionnaires concerning some im- values, such as conflicting views on abor-
portant problem or question; the mailings tion or sex education for teenagers.
are interspersed with controlled feedback to Evaluation of school busing programs to
the participants. The responses in each achieve racial balance offers an example
round of questioning are gathered by an rich with conflict. By what criteria ought
intermediary, who summarizes and returns busing programs be evaluated? Changed
the information to each participant, who racial attitudes? Changed interracial behav-
may then revise his own opinions and rat- iors? Improved student achievement? De-
ings. . . . However antagonistic the initial gree of parent involvement? Access to edu-
positions and complex the questions under cational resources? All are candidates for
the honor of primary program goal. Is surprise if they think that their primary task
school busing supposed to achieve desegre- will be formulating an evaluation design for
gation (representative proportions of mi- already established goals. Even where goals
nority students in all schools) or integration exist, they are frequently unrealistic, hav-
(positive interracial attitudes, cooperation, ing been exaggerated to secure funding.
and interaction)? Many communities, One reason evaluability assessment has be-
school boards, and school staffs are in open come an important preevaluation tool is
warfare over these issues. Central to the that, by helping programs get ready for
battles fought are basic disagreements evaluation, it acknowledges the common
about what evaluation criteria to apply (see need for a period of time to work with
Cohen and Weiss 1977; Cohen and Garet program staff, administrators, funders, and
1975). participants on clarifying goals—making
them realistic, meaningful, agreed on, and
evaluable (Wholey 1994; Smith 1989).
Evaluability Assessment Evaluability assessment often includes
and Goals Clarification fieldwork and interviews to determine how
much consensus there is among various
Evaluators have gotten heavily involved stakeholders about goals and to identify
in goals clarification because, when we are where differences lie. Based on this kind of
invited in, we seldom find a statement of contextual analysis, an evaluator can work
clear, specific, prioritized, and measurable with primary intended users to plan a strat-
goals. This can take novice evaluators by egy for goals clarification.
When an evaluability assessment reveals From a utilization-focused point of view,

broad aims and fuzzy goals, it's important the challenge is to calculate h o w early in-
to understand what role goals are under- teractions in the evaluation process will
stood to play in the program. Fuzzy goals affect later use. Typically, it's not useful to
actually characterize much human cogni- ignore goals conflict, accept poorly formu-
tion and reasoning (Zadeh et al. 1975:ix). lated or unrealistic goals, or let the evalu-
Laboratory experiments suggest that fuzzy ator assume responsibility for writing clear,
conceptualizing may be typical of half the specific, and measurable goals. Primary in-
population (Kochen 1975:407). N o won- tended users need to be involved in assess-
der evaluators have so much trouble get- ing h o w much effort to put into goals clari-
ting clear, specific, and measurable goals! fication. In doing so, both evaluators and
Carol Weiss (1972b) has commented in this primary intended users do well to heed the
regard: evaluation standard on political viability:
Part of the explanation [for fuzzy goals] The evaluation should be planned and con-
probably lies in practitioners' concentration ducted with anticipation of the different
on concrete matters of program functioning positions of various interest groups, so that
and their pragmatic mode of operation. They their cooperation may be obtained, and so
often have an intuitive rather than an analyti- that possible attempts by any of these groups
cal approach to program development. But to curtail evaluation operations or to bias or
there is also a sense in which ambiguity serves misapply the results can be averted or coun-
a useful function; it may mask underlying teracted. (Joint Committee on Standards
divergences in intent. . . glittering generali- 1994:F2)
ties that pass for goal statements are meant
to satisfy a variety of interests and perspec- There are alternatives to goals-based
tives, (p. 27) evaluation, alternatives we'll consider in
the next chapter. First, let's examine h o w
to work with intended users w h o w a n t to
Thus, evaluators have to figure out if ad-
focus on goals and results.
ministrators and staff are genuinely fuzzy
about what they're attempting to accom-
plish, or if they're simply being shrewd in
not letting the evaluator (or others) dis- Communicating About
cover their real goals, or if they're trying to Goals and Results
avoid conflict through vagueness.
Fuzzy goals, then, may be a conscious Part of the difficulty, I am convinced, is
strategy for avoiding an outbreak of goals the terminology: goals and objectives.
wars among competing or conflicting inter- These very words can intimidate staff.
ests. In such instances, the evaluation may Goals and objectives have become daunting
be focused on important questions, issues, weights that program staff feel around their
and concerns without resort to clear, spe- necks, burdening them, slowing their ef-
cific, and measurable objectives. However, forts, and impeding rather than advancing
more often than not in my experience, the their progress. Helping staff clarify their
difficulty turns out to be a conceptual prob- purpose and direction may mean avoiding
lem rather than deviousness. use of the term goals and objectives.
I've found program staff quite animated Focusing on Outcomes and Results
and responsive to the following kinds of
questions: What are you trying to achieve In the minds of many program people,
with your clients? If you are successful, from board members to front-line staff and
how will your clients be different after the participants, goals are abstract statements
program than they were before? What of ideals written to secure funding—meant
kinds of changes do you want to see in your to inspire, but never achieved. Consider
clients? When your program works as you this poster on the wall of the office of a
want it to, how do clients behave differ- program I evaluated: The greatest danger
ently? What do they say differently? What is not that we aim too high and miss, but
would I see in them that would tell me they that our goal is too low and we attain it.
are different? Program staff can often pro- For the director of this program, goals were
vide quite specific answers to these ques- something you put in proposals and plans,
tions, answers that reveal their caring and and hung on the wall, then went about your
involvement with the client change pro- business.
cess, yet when the same staff are asked to Let me illustrate the difference between
specify their goals and objectives, they traditional program goals and a focus on
freeze. participant outcomes with plans submitted
by county units to the Minnesota Depart-
After querying staff about what results
ment of Human Services (MDHS). 1 The
they hope to accomplish with program par-
plans required statements of outcomes.
ticipants, I may then tell them that what
Each statement below promises something,
they have been telling me constitutes their
but that something is not a change in client
goals and objectives. This revelation often
functioning, status, or well-being. These
brings considerable surprise. They often
statements reveal how people in social ser-
react by saying, "But we haven't said any-
vices have been trained to think about pro-
thing about what we would count." This,
gram goals. My comments, following each
as clearly as anything, I take as evidence of
goal, are meant to illustrate how to help
how widespread the confusion is between
program leaders and other intended evalu-
the conceptualization of goals and their
ation users reframe traditional goals to fo-
measurement. Help program staff and cus on participant outcomes.
other intended users be realistic and con-
crete about goals and objectives, but don't
make them hide what they are really trying
Problematic Outcome
to do because they're not sure how to write
Examples
a formally acceptable statement of goals
and objectives, or because they don't know
what measurement instruments might be 1. To continue implementation of a case man-
available to get at some of the important agement system to maintain continued con-
things they are trying to do. Instead, take tact with clients before, during, and after
them through a process that focuses on treatment.
achieving outcomes and results rather than Comment: Continued implementation of
writing goals. The difference, it turns out, the system is the goal. And what is promised
can be huge. for the client? "Continued contact."
2. Case management services will be available 7. County clients will receive services which
to all persons with serious and persistent they value as appropriate to their needs and
mental illness who require them. helpful in remediating their concerns.
Comment: This statement aims at avail-

Comment: Client satisfaction can be an
ability—a service delivery improvement.
important outcome, but it's rarely suffi-
Easily accessible services could be available
cient by itself. Especially in tax-supported
24 hours a day, but with what outcomes?
programs, taxpayers and policymakers
want more than happy clients. They want
3. To develop needed services for chronically
clients to have jobs, be productive, stay
chemically dependent clients.
sober, parent effectively, and so on. Client
Comment: This statement focuses on pro- satisfaction needs to be connected to other
gram services rather than the client out- desired outcomes.
comes. My review of county plans revealed
that most managers focus planning at the 8. Improve ability of adults with severe and
program delivery level, that is, the program's persistent mental illness to obtain employ-
goals, rather than how clients' lives will be ment.
improved.
Comment: Some clients remain for
4. To develop a responsive, comprehensive cri- years in programs that enhance their ability
sis intervention plan. to obtain employment—without ever
getting a job.
Comment: A plan is the intended out-
come. I found that many service provid- 9. Adults with serious and persistent mental
ers confuse planning with getting something illness will engage in a process to function
done. The characteristics of the plan— effectively in the community.
"responsive, comprehensive"— reveal noth-
ing about results for clients. Comment: Engaging in the process is as
much as this aims for, in contrast to cli-
5. Develop a supportive, family-centered, em- ents actually functioning effectively in the
powering, capacity-building intervention community.
system for families and children.
10. Adults with developmental disabilities will
Comment: This goal statement has all the
participate in programs to begin making
latest human services jargon, but, carefully
decisions and exercising choice.
examined, the statement doesn't commit to
empowering any families or actually enhanc-
Comment: Program participation is the
ing the capacity of any clients.
stated focus. This leads to counting how
many people show up rather than how
6. Expand placement alternatives.
many make meaningful decisions and exer-
Comment: More alternatives is the in- cise real choice. A client can participate
tended result, but to what end? Here is in a program aimed at teaching decision-
another system-level goal that carries the making skills, and can even learn those
danger of making placement an end in itself skills, yet never be permitted to make real
rather than a means to client improvement. decisions.
11. Each developmentally disabled consumer function effectively. If that outcome is at-
(or their substitute decision maker) will tained, they won't need hospitalizations.
identify ways to assist them to remain
connected, maintain, or develop natural 15. Improve quality of child protection inter-
supports. vention services.
Comment: This goal is satisfied, as writ-

ten, if each client has a list of potential Comment: I found a lot of outcome state-
connections. The provider, of course, can ments aimed at enhancing quality. Ironi-
pretty much guarantee composition of such cally, quality can be enhanced by improving
services without having an impact on client
a list. The real outcome: Clients who are
outcomes. Licensing and accrediting stan-
connected to a support group of people.
dards often focus on staff qualifications and
site characteristics (indicators of quality),
12. Adults in training and rehab will be in-
but seldom require review of what program
volved in an average of 120 hours of com-
participants achieve.
munity integration activities per quarter.
Comment: Quantitative and specific, but T h e point of reviewing these examples

the outcome stated goes only as far as being has been to show the kinds of goal state-
involved in activities, not actually being ments an evaluator may encounter w h e n
integrated into the community. beginning to w o r k with a p r o g r a m . A
utilization-focused evaluator can help in-
13. Key indicators of intended results and cli- tended users review plans and stated goals
ent outcomes for crisis services: to see if they include an outcomes focus.
• Number of patients served There's nothing w r o n g with p r o g r a m
• Number of patient days and the average level (e.g., improve access or quality) or
length of stay system level (e.g., reduce costs) goals, but
• Source of referrals to the crisis unit and re- such goals ought to connect to o u t c o m e s
ferrals provided to patients at discharge for clients. An evaluator can facilitate dis-
cussion of why, in the current political
Comment: Participation numbers, not environment, one hears increased demand
client outcomes. for "outcomes-based" management and pro-
gram funding ( M D H S 1 9 9 6 ; Behn 1 9 9 5 ;
14. Minimize hospitalizations of people with I C M A 1 9 9 5 ; Knapp 1 9 9 5 ; Schalock
severe and persistent mental illness. 1 9 9 5 ; Schorr 1 9 9 3 ; Brizius and Campbell
Comment: This is a system level outcome 1 9 9 1 ; Williams, W e b b , and Phillips 1 9 9 1 ;
that is potentially dangerous. One of the Carver 1990). Evaluators need t o provide
premises of results-oriented management technical assistance in helping p r o g r a m
reviewed in Chapter 1 is that "what gets planners, managers, and other potential
measured gets done." An easy way to attain evaluation users understand the differ-
this desired outcome is simply not to ence between a participant outcomes ap-
refer or admit needy clients to the hospital. proach and traditional p r o g r a m or system
That will minimize hospitalizations (a sys- goals approaches. In particular, they need
tem-level outcome) but may not help assistance understanding the difference
clients in need. A more appropriate out- between service-focused goals versus client-
come focus would be that these clients focused outcome goals. Exhibit 7.1 com-
EXHIBIT 7.1
Service-Focused Versus Client-Focused Outcome Evaluation:
Examples From Parenting Programs
Service-Focused Client-Focused Outcome
Provide coordinated case management Pregnant adolescents will give birth to healthy
services with public health to pregnant babies and care for the infants and themselves
adolescents appropriately
Improve the quality of child protection Children will be safe; they will not be abused
intervention services or neglected
Develop a supportive, family-centered, Parents will adequately care and provide for
capacity-building intervention system for their children
families and children
Provide assistance to parents to make Parents who wish to work will have adequate
employment-related child care decisions child care
pares these two kinds of goals. Both can water. Longer-term outcomes are that the
be useful, but they place emphasis in dif- horse stays healthy and works effectively.
ferent places. But because program staff know they can't
make a horse drink water, they focus on the
things they can control: leading the horse
Leading a Horse to Water to water, making sure the tank is full, moni-
Versus Getting It to Drink toring the quality of the water, and keeping
the horse within drinking distance of the
The shift from service goals to outcomes water. In short, they focus on the processes
often proves difficult in programs and of water delivery rather than the outcome
agencies that have a long history of focus- of water drunk. Because staff can control
ing on services and activities. But even processes but cannot guarantee attaining
where the difference is understood and outcomes, government rules and regula-
appreciated, some fear or resistance may tions get written specifying exactly how to
emerge. One reason is that service provid- lead a horse to water. Funding is based on
ers are well schooled in the proverbial wis- the number of horses led to water. Licenses
dom that "you can lead a horse to water, are issued to individuals and programs that
but you can't make it drink." meet the qualifications for leading horses
This familiar adage illuminates the chal- to water. Quality awards are made for im-
lenge of committing to outcomes. The de- proving the path to the water—and keep-
sired outcome is that the horse drink the ing the horse happy along the way.
Whether the horse drinks the water gets • details of data collection
lost in all this flurry of lead-to-water-ship. • how results will be used
Most reporting systems focus on how many • performance targets
horses get led to the water, and how diffi-
cult it was to get them there, but never quite I'll discuss each of these elements and offer
get around to finding out whether the illustrations from actual programs to show
horses drank the water and stayed healthy. how they fit together. Evaluators can use
One point of resistance to outcomes this framework to work with primary in-
accountability, then, is the fear among tended users.
providers and practitioners that they're be-
ing asked to take responsibility for, and will
be judged on, something over which they Identifying Specific Participant
have little control. The antidote to this fear or Client Target Groups
is building into programming incentives
for attaining outcomes and establishing a I'll use the generic term client to include
results-oriented culture in an organization program participants, consumers of ser-
or agency. Evaluators have a role to play in vices, beneficiaries, students, and custom-
such efforts by facilitating a process that ers, as well as traditional client groups. The
helps staff, administrators, and other stake- appropriate language varies, but for every
holders think about, discuss the implica- program, there is some group that is ex-
tions of, and come to understand both the pected to benefit from and attain outcomes
advantages and limitations of an outcomes as a result of program participation. How-
approach. There's a lot of managerial ever, the target groups identified in enab-
and political rhetoric about being results ling legislation or existing reporting sys-
oriented, but not much expertise in how tems typically are defined too broadly for
to set up a results-oriented system. The meaningful outcomes measurement. In-
next section presents a framework for tended outcomes can vary substantially for
conceptualizing outcomes that are mean- subgroups within general eligible popu-
ingful and measurable for use in facilitat- lations. The trick is to be as specific as
ing an outcomes-oriented management and necessary to conceptualize meaningful out-
evaluation system. comes. Some illustrations may help clarify
why this is so.
Consider a program aimed at support-
Utilization-Focused ing the elderly to continue living in their
Outcomes Framework homes, with services ranging from "meals
on wheels" to home nursing. Not all elderly
This framework distinguishes six sepa- people can or want to stay in their homes.
rate elements that need to be specified for Therefore, if the desired outcome is "con-
focusing an evaluation on participant or tinuing to live in their own home," it would
client outcomes: be inappropriate to specify that outcome
for all elderly people. A more appropriate
• a specific participant or client target group target population, then, would be people
• the desired outcome(s) for that target group over the age of 55 who want to and can
• one or more indicators for each desired remain safely in their homes. For this
outcome group, it is appropriate to aim to keep them
in their homes. It is also clear that some els refer to expected outcomes or intended
kind of screening process will be neces- outcomes. Others prefer the language of
sary to identify this subpopulation of the client goals or client objectives. What is
elderly. important is not the phrase used but that
A different example comes from pro- there be a clear statement of the targeted
grams serving people with developmental change in circumstances, status, level of
disabilities (DD). Many programs exist to functioning, behavior, attitude, knowl-
prepare DD clients for work and then sup- edge, or skills. Other outcome types in-
port them in maintaining employment. clude maintenance and prevention. Exhibit
However, not all people with developmen- 7.2 provides examples of outcomes.
tal disabilities can or want to work. In cases
where funding supports the right of DD
clients to choose whether to work, the Outcome Indicators
appropriate subpopulation become people
with developmental disabilities who can An indicator is just that, an indicator. It's
and want to work. For that specific sub- not the same as the phenomenon of inter-
population, then, the intended outcome est, but only an indicator of that phenome-
could be that they obtain and maintain non. A score on a reading test is an indica-
satisfying employment. tor of reading ability but should not be
There are many ways of specifying sub- confused with a particular person's true
population targets. Outcomes are often dif- ability. All kinds of things affect a test score
ferent for young, middle-aged, and elderly on a given day. Thus, indicators are inevi-
clients in the same general group (e.g., tably approximations. They are imperfect
persons with serious and persistent mental and vary in validity and reliability.
illness). Outcomes for pregnant teens or The resources available for evaluation
teenage mothers may be different from out- will greatly affect the kinds of data that can
comes for mothers receiving welfare who be collected for indicators. For example, if
have completed high school. Outcomes for the desired outcome for abused children is
first-time offenders may be different from that there be no subsequent abuse or ne-
those for repeat offenders. The point is that glect, a periodic in-home visitation and
categories of funding eligibility often in- observation, including interviews with the
clude subgroups for whom different out- child, parent(s), and knowledgeable others,
comes are appropriate. Similarly, when iden- would be desirable, but such data collection
tifying groups by services received, for exis expensive. With constrained resources,
ample, counseling services or jobs training, one may have to rely on routinely collected
the outcomes expected for generic services data, that is, official substantiated reports
may vary by subgroups. It is important, of abuse and neglect over time. Moreover,
then, to make sure an intended outcome is when using such routine data, privacy and
meaningful and appropriate for everyone confidentiality restrictions may limit the
in the identified target population. indicator to aggregate results quarter by
quarter rather than one that tracks specific
Specifying Desired Outcomes families over time.
As resources change, the indicator may
The choice of language varies under dif- change. Routine statistics may be used by
ferent evaluation approaches. Some mod- an agency until a philanthropic foundation
E X H I B I T 7.2
Outcome Examples
Type of Change Illustration
Change in circumstances Children safely reunited with their families of origin from foster care
Change in status Unemployed to employed
Change in behavior Truants will regularly attend school
Change in functioning Increased self-care; getting to work on time
Change in attitude Greater self-respect
Change in knowledge Understand the needs and capabilities of children at different ages
Change in skills Increased reading level; able to parent appropriately
Maintenance Continue to live safely at home (e.g., the elderly)
Prevention Teenagers will not use drugs
funds a focused evaluation to get better are still alive a year after the trees are
data for a specific period of time. In such a planted.
case, the indicator would change, but the Another factor affecting indicator selec-
desired outcome would not. This is the tion is the demands data collection will put
advantage of clearly distinguishing the de- on program staff and participants. Short-
sired outcome from its indicator. As the term interventions such as food shelves,
state of the art of measurement develops or recreational activities for people with de-
resources change, indicators may improve velopmental disabilities, drop-in centers,
without changing the desired outcome. and one-time community events' do not
Time frames also affect indicators. The typically engage participants intensely
ultimate goal of a program for abused enough to justify collection of much, if any,
children would be to have them become data. Many programs can barely collect
healthy, well-functioning, and happy adults, data on end-of-program status, much less
but policymakers cannot wait 10 to 15 follow-up data.
years to assess the outcomes of a program In short, a variety of factors influence
for abused children. Short-term indicators the selection of indicators, including the
must be relied on, things like school at- importance of the outcome claims being
tendance, school performance, physical made, resources available for data collec-
health, and the psychological functioning tion, the state of the art of measurement of
of a child. These short-term indicators pro- human functioning, the nature of decisions
vide sufficient information to make judg- to be made with the results, and the will-
ments about the likely long-term results. It ingness of staff and participants to engage
takes 30 years for a forest to grow, but you in assessment. Some kind of indicator is
can assess the likelihood of ending up with necessary, however, to measure degree of
a forest by evaluating how many saplings outcome attainment. The key is to make
sure that the indicator is a reasonable, use- to establish a purpose and direction for a
ful, and meaningful measure of the in- program. It is quite another thing to say
tended client outcome. how that purpose and direction are to be
The framework offered here will gener- measured. By confusing these two steps and
ate outcome statements that are clear, spe- making them one, program goals can be-
cific, and measurable, but getting clarity come detached from what program staff
and specificity is separated from selecting and funders are actually working to accom-
measures. The reason for separating the plish. Under such a constraint, staff begin
identification of a desired outcome from by figuring out what can be measured.
its measurement is to ensure the utility of Given that they seldom have much exper-
both. This point is worth elaborating. The tise in measurement, they end up counting
following is a classic goal statement: fairly insignificant behaviors and attitudes
that they can somehow quantify.
Student achievement test scores in reading When I work with groups on goals clari-
will increase one grade level from the begin- fication, I have them state intended out-
ning of first grade to the beginning of second comes without regard to measurement.
grade. Once they have stated as carefully and ex-
plicitly as they can what they want to ac-
Such a statement mixes together and poten- complish, then it is time to figure out what
tially confuses the (1) specification of a indicators and data can be collected to
desired outcome with (2) its measurement monitor outcome attainment. They can
and (3) the desired performance target. then move back and forth between con-
The desired outcome is increased student ceptual level statements and operational
achievement. The indicator is a norm- (measurement) specifications, attempting
referenced standardized achievement test. to get as much precision as possible in both.
The performance target is one year's gain
To emphasize this point, let me overstate
on the test. These are three separate deci-
the trade-off. I prefer to have soft or rough
sions that primary intended evaluation us-
measures of important goals rather than
ers need to discuss. For example, there are
highly precise, quantitative measures of
ways other than standardized tests for
goals that no one much cares about. In too
measuring achievement, for example, stu-
dent portfolios or competency-based tests. many evaluations, program staff are forced
The desired outcome should not be con- to focus on the latter (meaningless but
fused with its indicator. In the framework measurable goals) instead of on the former
offered here, outcome statements are (meaningful goals with soft measures).
clearly separated from operational criteria Of course, this trade-off, stated in stark
for measuring them. terms, is only relative. It is desirable to have
Another advantage of separating out- as much precision as possible. By separating
comes identification from indicator selec- the process of goals clarification from the
tion is to encourage program staff to be process of selecting goal indicators, it is
serious about the process. A premature fo- possible for program staff to focus first on
cus on indicators may be heard as limiting what they are really trying to accomplish
a program to attempt only those things that and to state their goals and objectives as
staff already know how to measure. Such a explicitly as possible without regard to
limitation is too constraining. It is one thing measurement, and then to worry about
how one would measure actual attainment In a political environment of outcomes

of those goals and objectives. mania, meaningfulness and utility are not
necessarily priorities. Consider this exam-
ple and judge for yourself. The 1995 An-
nual Management Report from the Office
Performance Targets of the New York City Mayor included this
performance target: The average daytime
A performance target specifies the speed of cars crossing from one side of
amount or level of outcome attainment that midtown Manhattan to the other will in-
is expected, hoped for, or, in some kinds of crease from 5.3 to 5.9 miles per hour.
performance contracting, required. What Impressed by this vision of moving from
percentage of participants in employment a "brisk 5.3" to a "sizzling 5.9," The
training will have full-time jobs six months New Yorker magazine interviewed Ruben
after graduation: 40%? 65%? 80%? What Ramirez, Manhattan's Department of
percentage of fathers failing to make child Transportation Traffic Coordinator, to ask
support payments will be meeting their full how such a feat could be accomplished in
child support obligations within six months the face of downsizing and budget cuts.
of intervention? 15%? 35%? 60%? Ramirez cited better use of resources.
The best basis for establishing future Asked what could he accomplish with ade-
performance targets is past performance. quate resources, he replied: "I think we
"Last year we had 65% success. Next year could do six or seven, and I'm not being
we aim for 70%." Lacking data on past outrageous." The New Yorker found such a
performance, it may be advisable to wait performance target a "dreamy future," one
until baseline data have been gathered be- in which it might actually be possible to
fore specifying a performance target. Arbi- drive across midtown Manhattan faster
trarily setting performance targets without than you can walk ("Speed" 1995:40).
some empirical baseline may create artifi- Is such a vision visionary? Is a perfor-
cial expectations that turn out unrealisti- mance increase from 5.3 to 5.9 miles per
cally high or embarrassingly low. One way hour meaningful? Is 6 or 7 worth aiming
to avoid arbitrariness is to seek norms for for? For a noncommuting Minnesotan,
reasonable levels of attainment from other, such numbers fail to impress. But, con-
comparable programs, or review the evalu- verted into annual hours and dollars saved
ation literature for parallels. for commercial vehicles in Manhattan, the
As indicators are collected and exam- increase may be valued in hundreds of
ined over time, from quarter to quarter, thousands of dollars, perhaps even mil-
and year to year, it becomes more meaning- lions. It's for primary stakeholders in Man-
ful and useful to set performance targets. hattan, not Minnesota, to determine the
The relationship between resources and meaningfulness of such a performance
outcomes can also be more precisely corre- target.
lated longitudinally, with trend data, all of
which increases the incremental and long-
term value of an outcomes management Details of Data Collection
approach.
The challenge is to make performance The details of data collection are a dis-
targets meaningful. tinct part of the framework; they must be
attended to, but they shouldn't clutter the what would you do? If the findings came
focused outcome statement. Unfortunately, out this other way, what would that tell
I've found that people can get caught up in you, and what actions would you take?
the details of refining methods and lose Given what you want the evaluation to
sight of the outcome. The details typically accomplish, have we focused on the right
get worked out after the other parts of the outcomes and useful indicators? At every
framework have been conceptualized. De- stage of a utilization-focused evaluation,
tails include answering the following kinds the evaluator facilitator pushes intended
of questions: users to think seriously about the implica-
tions of design and measurement decisions
• What existing data will be used and how will for use.
they be accessed? Who will collect new indi-
cators data?
• Who will have oversight and management Interconnections Among the
responsibility for data collection? Distinct Parts of the Framework
• How often will indicators data be collected?
How often reported? The utilization-focused outcomes frame-
• Will data be gathered on all program partici- work, as just reviewed, consists of six parts:
pants or only a sample? If a sample, how a specific participant target group; a de-
selected? sired outcome for that group; one or more
• How will findings be reported? To whom? outcome indicators; a performance target
In what format? When? How often? (if appropriate and desired); details of data
collection; and specification of how find-
These pragmatic questions put flesh on ings will be used. While these are listed in
the bones of the outcomes framework. the order in which intended users and staff
They are not simply technical issues, how- typically conceptualize them, the concep-
ever. How these questions get answered tualization process is not linear. Groups
will ultimately determine the credibility often go back and forth in iterative fashion.
and utility of the entire approach. Primary The target group may not become really
intended users need to be involved in mak- clear until the desired outcome is specified
ing decisions about these issues to ensure or an indicator designated. Sometimes for-
that they feel ownership of and responsi- mulating the details of data collection will
bility for all aspects of the evaluation. give rise to new indicators, and those indi-
cators force a rethinking of how the desired
outcome is stated. The point is to end up
How Results Will Be Used with all elements specified, consistent with
each other, and mutually reinforcing. That
The final element in the framework is to doesn't necessarily mean marching through
make sure that the data collected on the the framework lockstep.
outcomes identified will be useful. This Exhibit 7.3 provides an example of all
means engaging intended users in a simula- the elements specified for a parenting pro-
tion exercise in which they pretend that gram aimed at high school-age mothers.
they have results and are interpreting and Completing the framework often takes
using those results. The evaluation facilita- several tries. Exhibit 7.4 shows three ver-
tor asks: If the results came out this way, sions of the utilization-focused outcomes
EXHIBIT 7.3
Example of a Fully Specified
Utilization-Focused Outcome Framework
Target subgroup: Teenage mothers at Central High School
Desired outcome: Appropriate parenting knowledge and practices
Outcome indicator: Score on Parent Practice Inventory (knowledge and behavior

measures)
Data collection: Pre- and post-test, beginning and end of program; six-month follow-
district evaluation office will administer and analyze results
Performance target: 75% of entering participants will complete the program and attain
a passing score on both the knowledge and behavior scales
Use: The evaluation advisory task force will review the results (principal,
two teachers, two participating students, one agency representative
one community representative, an associate superintendent, one
school board member, and the district evaluator). The task force will
decide if the program should be continued at Central High School
and expanded to other district high schools. A recommendation will
be forwarded to the superintendent and school board.
framework as it emerged from the work of political judgment. Those involved will feel
a developmental disabilities staff group. the most ownership of the resulting system.
Their first effort yielded a service-oriented Some processes involve only managers
goal. They revised that with a focus on skill and directors. Other processes include ad-
enhancement. Finally, they agreed on a visory groups from the community. Col-
meaningful client outcome: functioning laboration between funders and service
independently. providers in determining outcomes is criti-
cal where contracts for services are in-
volved. Advice from some savvy foundation
A Utilization-Focused Process
funders is to match outcomes evaluation to
for Developing Outcomes
the stage of a program's development
A central issue in implementing an out- (Daniels 1996), keep the context long-term
comes evaluation approach is who will be (Mcintosh 1996) and "turn outcome
involved in the process of developing the 'sticks' into carrots" (Leonard 1996:46).
outcomes. When the purpose is ongoing Exhibit 7.5 shows the stages of a utiliza-
management by outcomes, the program's tion-focused approach to developing an
executives and staff must buy into the pro- outcomes-based management system for a
cess. Who else is involved is a matter of program (MDHS 1996). Critical issues and
=P •4=
E
-c
o CD
CO I?
CO
C
CO ^
CO "cO
s "D "O
,0S9
3
UBS
Lto
CO
CD
cr CO
O •o
>•
rvi
file
CO
< o
T3
CD
CO
"co
CD
0> o
o
CO o 1
St
O 05 CD Q.
s
ist rat ion

UjJ
CO CD
CO CO o
o CO CO
^ E O
s
oni
^ o
O "CO •§
will
c
E ~o cr ZJ Ol
individ
atei
ILUp
CO 'E E
"O CO
art
CO 03 en o
ex o
>-
rterl
CO
CO
tr CD o —
CO
CO
3
E o
O
o j5 O .= Q
E
o o
3
 o
Is
:l cu
g
CO
o "g
CO
CO
CO
- i 3
o — £*
ffl •? - t
m - O
—
1 o
c
X «
w o S E
CO
c S -S
o
"35 CO
5
> £ £ £ ^^-, CD 5 •*-
cz
CD
CD
°
UfCQ
3 8AI
opm
dail
CD E CO E CD
CU Q. T3
O
O =
-^
o.
o C
O "55 -e CD
CO
o
"Q
S o CO
> .!= >
CD
CD
as CD CD
s>
sire
inb
ID •o "a
CO
wit
CD
CD en
Q 5 .2
> _
^ ^ co
o s a o s O 5 o
CD
E
o
CD
CD
I
O
£ co
CD
in
CO
a
£< o
CO
tr
E
CD
cc ii fc
165
-J* : « co
sun lllrt
ttt CD CD
lllllt
VZ *
-111
2 E -
El Wit
^ • S - 2 ">
i S>
II co
If!
ssini
Iders
ngs
key
O <D
CD
o w om
!fU
O
CO
5§ •l3 §5 - I0
= X5.S£ 4 i <D1 t. o CD
 1
rr
o
„ „ <o
lliilllll III
<
C
CO
"Mil! iilliuil
o o,
1 I II
Mm
c/T
cu •es -SK
3
CO
CO
iitiiiiiiiiiir
CO
cu
1*1 1* *= - 5 ^
CO
CO
CO
Hill
to
liiMHfi lill
a>
E
iti
o
o
fiimi iljiiiii in
c
"5>
re
c
re
S ^ jf, -5 -S
LU-K J 3 S mil
Kill«lil
Jlllll Hiiifiii lillilfi
CD
5
CD £=
3
CD - 0
CD
>- -•=:
cn
s
s
.c Hi
IIS
< .E ^
? £ » • .1,1 — to
c o
CO >
Sijlllltlll Illllili
ilifllilSiI
<D ^ CD
Hllffli.1 saBeis
HiMlH
limit! I
sanssi
CO O J
3 .S-
sauiAjiov
parallel activities are shown for each stage. guidelines for working with intended users
Those to be involved will need training and to identify meaningful and useful goals.
support. I've found it helpful to begin with
an overview of the purpose of an outcomes- 1. Distinguish between outcome goals
focused programming approach: history, and activities. Outcomes describe desired
trends, the political climate, and potential impacts of the program on participants:
benefits. Then I have participants work in Students will read with understanding. Par-
small groups working on the elements of ticipants will stop smoking. Activities goals
the utilization-focused outcomes frame- describe how outcome goals will be
work (see Exhibit 7.3) for an actual pro- achieved: Students will read two hours a
gram with which they're familiar. Facili- day. Participants will openly discuss their
tation, encouragement, and technical as- dependence on cigarettes. People in the
sistance are needed to help such groups program will be treated with respect.
successfully complete the task. Where mul-
tiple groups are involved, I like to have 2. Outcome goals should be clearly out-
them share their work and the issues that come oriented. Program staff often write
emerged in using the outcomes framework. activity goals thinking that they have stated
It's important that those involved get a desired outcomes. An agricultural exten-
chance to raise their concerns openly. sion agent told me his goal was "to get 50
There's often suspicion about political mo- farmers to participate in a farm tour." But
tives. Providers worry about funding cuts what, I asked, did he want to result from
and being held accountable for things they the farm tour? After some dialogue, it be-
can't control. Administrators and directors came clear that the outcome goal was this:
of programs worry about how results will "Farmers will adopt improved milking
be used, what comparisons will be made, practices in their own farm operations."
and who will control the process. Line staff A corporation stated one of its goals for
worry about the amount of time involved, the year as "establishing a comprehensive
paperwork burdens, and the irrelevancy of energy conservation program." After we
it all. State civil servants responsible for discussed that it was perfectly possible to
reporting to the Legislature worry about establish such a program without ever sav-
how data can be aggregated at the state ing any energy, they rewrote the goal: "The
level. These and other concerns need to be corporation will significantly reduce en-
aired and addressed. Having influential ergy consumption."
leaders visibly involved in the process en-
hances their own understanding and com- 3. It should be possible to conceptual-
mitment while also sending signals to oth- ize either the absence of the desired out-
ers about the importance being placed on come or an alternative to it. Some goal
outcomes. statements are amazingly adept at saying
nothing. I worked with a school board
whose overall goal was "Students will
learn." There is no way not to attain this
Meaningful and Useful Goals goal. It is the nature of the species that
young people learn. Fortunately, they can
With the utilization-focused outcomes learn in spite of the schools. The issues are
framework as background, here are 10 what and how much they will learn.
Another favorite is "increasing aware- ent audiences for a variety of purposes."

ness." It's fairly difficult to put people The New York Times (1996) found this
through two weeks of training on some goal less than inspiring or user-friendly,
topic (e.g., chemical dependency) and not and editorialized: "a fog of euphemism
increase awareness. Under these condi- and evasion" (p. A24). Bumper sticker:
tions, the goal of "increasing awareness of Honk if you use writing process elements
chemical dependency issues" is hardly appropriately.
worth aiming at. Further dialogue revealed
that the program staff wanted to change 6. Formal goals statements should fo-
knowledge, attitudes, and behavior. cus on the most important program out-
comes. Writing goals should not be a mara-
4. Each goal and objective should con- thon exercise in seeing how long a
tain only one idea. There is a tendency in document one can produce. As human be-
writing goal statements to overload the ings, our attention span is too short to focus
content. on long lists of goals and objectives. Limit
them to outcomes that matter and for
5. The statement of goals and objec- which the program intends to be held
tives should be understandable. Goals accountable.
should communicate a clear sense of direc-
tion. Avoid difficult grammatical construc- 7. Keep goal statements separate from
tions and complex interdependent clauses. statements of how goals are to be attained.
Goal statements should also avoid internal An agricultural extension program had this
program or professional jargon. The gen- goal: "Farmers will increase yields through
eral public should be able to make sense of the educational efforts of extension includ-
goals. Consider these two versions of goal ing farm tours, bulletins, and related activi-
statements for what amount to the same ties." Everything after the word yields de-
outcome: scribes how the goal is to be attained. Keep
the goal focused, clear, and crisp.
(a) To maximize the capabilities of professional
staff and use taxpayer resources wisely 8. Separate goals from indicators. Ad-
while engaging in therapeutic interventions vocates of management by objectives and
and case management processes so that behavioral objectives often place more em-
children's developmental capacities are un- phasis on measurement than on establish-
encumbered by adverse environmental cir- ing a clear sense of direction (Combs 1972).
cumstances or experiences. The two are related, but not equivalent.
(b) Children will be safe from abuse and neglect.
9. Make the writing of goals a positive
Now, see if you can make sense of this experience. Goals clarification exercises
beauty from the National Council of are so often approached as pure drudgery
Teachers of English and the International that staff hate not only the exercise itself
Reading Association: "Students employ a but also the resulting goals. Goals clarifica-
wide range of strategies as they write and tion should be an invigorating process of
use different writing process elements ap- prioritizing what those involved care about
propriately to communicate with differ- and hope to accomplish. Goals should not
become a club for assaulting staff but a tool clarify purposes at three levels: the overall
for helping staff focus and realize their mission of the program or organization, the
ideals. goals of specific programmatic units (or
subsystems), and the specific objectives that
10. Thou shalt not covet thy neighbor's specify desired outcomes. The mission
goals and objectives. Goals and objectives statement describes the general direction of
don't travel very well. They often involve the overall program or organization in
matters of nuance. It is worth taking the long-range terms. The peacetime mission
time for primary stakeholders to construct of the U.S. Army is simply "readiness." A
their own goals so that they reflect their mission statement may specify a target
own values, expectations, and intentions in population and a basic problem to be at-
their own language. tacked. For example, the mission of the
There are exceptions to all of these Minnesota Comprehensive Epilepsy Pro-
guidelines, particularly the last one. One gram was to "improve the lives of people
option in working with groups is to have with epilepsy."
them review the goals of other programs, The terms goals and objectives have
both as a way of helping stakeholders clar- been used interchangeably up to this point,
ify their own goals and to get ideas about but it is useful to distinguish between them
format and content. Evaluators who work as representing different levels of general-
with behavioral objectives often develop a ity. Goals are more general than objectives
repertoire of potential objectives that can and encompass the purposes and aims of
be adopted by a variety of programs. The program subsystems (i.e., research, educa-
evaluator has already worked on the tech- tion, and treatment in the epilepsy exam-
nical quality of the goals so program staff ple). Objectives are narrow and specific,
can focus on selecting the content they stating what will be different as a result of
want. Where there is the time and inclina- program activities, that is, the concrete out-
tion, however, I prefer to have a group comes of a program. To illustrate these
work on its own goals statement so that differences, a simplified version of the mis-
participants feel ownership and understand sion statement, goals, and objectives for the
what commitments have been made. This Minnesota Comprehensive Epilepsy Pro-
can be part of the training function served gram is presented in Exhibit 7.6. This out-
by evaluators, increasing the likelihood line was developed after an initial discus-
that staff will have success in future goals sion with the program director. The
clarification exercises. purpose of the outline was to establish a
context for later discussions aimed at more
clearly framing specific evaluation ques-
Levels of Goal Specification tions. In other words, we used this goals
clarification and objectives mapping exer-
From Overall Mission cise as a means of focusing the evaluation
to Specific Objectives question rather than as an end in itself.
The outline of goals and objectives for
To facilitate framing evaluation ques- the Epilepsy Project (Exhibit 7.6) illustrates
tions in complex programs, evaluators may several points. First, the only dimension
have to work with primary stakeholders to that consistently differentiates goals and
objectives is the relative degree of specific- an exercise. In complex programs, evalua-

ity of each: objectives narrow the focus of tors can spend so much time working on
goals. There is no absolute criterion for goals statements that considerable momen-
distinguishing goals from objectives; the tum is lost.
distinction is always a relative one.
Second, this outline had a specific evalu-
ation purpose: to facilitate priority setting Establishing Priorities:
as I worked with primary intended users to Importance Versus Utility
focus the evaluation. Resources were insuf-
ficient to fully evaluate all three component Let me elaborate the distinction between
parts of the program. Moreover, different writing goals for the sake of writing goals
program components faced different con- and writing them to use as tools in nar-
tingencies. Treatment and research had rowing the focus of an evaluation. In
more concrete outcomes than education. utilization-focused evaluation, goals are
The differences in the specificity of the prioritized in a manner quite different from
objectives for the three components reflect that usually prescribed. The usual criterion
real differences in the degree to which the for prioritizing goals is ranking or rating in
content and functions of those program terms of importance (Edwards, Guttentag,
subsystems were known at the beginning of and Snapper 1975; Gardiner and Edwards
the evaluation. Thus, with limited re- 1975). The reason seems commonsensical:
sources and variations in goal specific- Evaluations ought to focus on important
ity, it was necessary to decide which goals. But, from a utilization-focused per-
aspects of the program could best be served spective, what appears to be most sensible
by evaluation. may not be most useful.
Third, the outline of goals and objectives The most important goal may not be the
for the Comprehensive Epilepsy Program one that decision makers and intended
is not particularly well written. I con- users most need information about. In
structed the outline from notes taken dur- utilization-focused evaluation, goals are
ing my first meeting with the director. At also prioritized on the basis of what infor-
this early point in the process, the outline mation is most needed and likely to be most
was a tool for posing this question to evalu- useful, given the evaluation's purpose. For
ation decision makers: Which program example, a summative evaluation would
components, goals, and objectives should likely evaluate goals in terms of overall
be evaluated to produce the most useful importance, but a formative evaluation
information for program improvement and might focus on a goal of secondary impor-
decision making? That is the question. To tance because it is an area being neglected
answer it, one does not need technically or proving particularly troublesome.
perfect goal statements. Once the evalu- Ranking goals by importance is often
ation is focused, relevant goals and objec- quite different from ranking them by the
tives can be reworked as necessary. The utility of evaluative information needed.
point is to avoid wasting time in the con- Exhibit 7.7 provides an example from the
struction of grandiose, complex models of Minnesota Comprehensive Epilepsy Pro-
program goals and objectives just because gram, contrasting goals ranked by impor-
the folklore of evaluation prescribes such tance and utility. Why the discrepancy? The
EXHIBIT 7.6
Minnesota Comprehensive Epilepsy Program:
Mission Statement, Goals, and Objectives
Program Mission: Improve the lives of people with epilepsy
Research Component
Goal 1: Produce high quality, scholarly research on epilepsy
Objective 1: Publish research findings in high-quality, refereed journals
Objective 2: Contribute to knowledge about:

a. neurological aspects of epilepsy
b. pharmacological aspects of epilepsy
c. epidemiology of epilepsy
d. social and psychological aspects of epilepsy
Goal 2: Produce interdisciplinary research
Objective 1: Conduct research projects that integrate principal investigators from

different disciplines
Objective 2: Increase meaningful exchanges among researchers from different

disciplines
Education Component
Goal 3: Health professionals will know the nature and effects of epilepsy behaviors
Objective 1: Increase the knowledge of health professionals who serve people with
epilepsy so that they know:
a. what to do if a person has a seizure
b. the incidence and prevalence of epilepsy
Objective 2: Change the attitudes of health professionals so that they:

a. are sympathetic to the needs of people with epilepsy
b. believe in the importance of identifying the special needs of people with epilepsy
Goal 4: Educate persons with epilepsy about their disorder
Goal 5: Inform the general public about the nature and incidence of epilepsy.
Treatment Component
Goal 6: Diagnose, treat, and rehabilitate persons with severe, chronic, and disabling seizures
Objective 1: Increase seizure control in treated patients
Objective 2: Increase the functioning of patients

EXHIBIT 7.7
Minnesota Comprehensive Epilepsy Program:
Goals Ranked by Importance to Program Versus Goals Ranked
by Utility of Evaluative Information Needed by Primary Users
Ranking Goals by Usefulness of

Ranking of Goals by Importance Evaluative Information to Intended Users
1. Produce high-quality scholarly research 1.

1. Integrate the separate program
on epilepsy components into a comprehensive whole
that is greater than the sum of its parts
2. Produce interdisciplinary research 2. Educate health professionals about epilepsy
3. Integrate the separate components into 3. Diagnose, treat, and rehabilitate people
a whole with chronic and disabling seizures
4. Diagnose, treat, and rehabilitate people 4. Produce interdisciplinary research

with chronic and disabling seizures
staff did not feel they needed a formal al cause it was a goal area about which the
evaluation to monitor attainment of the le program staff had many questions. The
most important program goal. The publish- h- education component was expected to be a
ing of scholarly research in refereed jour-r- difficult, long-term effort. Information
nals was so important that the director wasas about how to increase the educational
committed to personally monitor perfor- r- impact of the Comprehensive Epilepsy
mance in that area. Moreover, he was rela- a- Program had high use potential. In a
tively certain about how to achieve that at utilization-focused approach, the primary
outcome, and he had no specific evaluation >n intended users make the final decision
:d
question related to that goal that he needed about evaluation priorities,
answered. By contrast, the issue of compre-e- In my experience, the most frequent
hensiveness was quite difficult to assess. It reason for differences in importance and
was not at all clear how comprehensiveness ss usefulness rankings is variation in the de-
could be facilitated, although it was thirdrd gree to which decision makers already have
on the importance list. Data on compre- e- what they consider good information about
hensiveness had high formative utility. performance on the most important goal,
The education goal, second on the use- e- At the program level, staff members may be
fulness list, does not even appear among lg so involved in trying to achieve their most
the top four goals on the importance list. »t. important goal that they are relatively well
Yet, information about educational impact ct informed about performance on that goal,
was ranked high on the usefulness list be- e- Performance on less important goals may
involve less certainty for staff; information Contrast that advice with the perspective
about performance in that goal area is of an evaluator from our study of use of
therefore more useful because it tells staff federal health evaluations:
members something they d o not already
know. I'd make this point about minor evaluation
W h a t I hope is emerging through these studies. If you have an energetic, conscien-
examples is an image of the evaluator as an tious program manager, he's always inter-
active-reactive adaptive problem solver. ested in improving his program around the
periphery, because that's where he usually
The evaluator actively solicits information
can. And an evaluation study of some minor
about program contingencies, organiza-
aspect of his program may enable him to
tional dynamics, environmental uncertain-
significantly improve. [EV52:171]
ties, and decision makers' goals in order to
focus the evaluation on questions of real
In our study, we put the issue to deci-
interest and utility to primary intended
sion makers and evaluators as follows:
users.
Another factor sometimes believed to affect

use has to do with whether the central objec-
tives of a program are evaluated. Some writ-
Evaluation of Central
ers argue that evaluations can have the great-
Versus Peripheral Goals
est impact if they focus on major program
objectives. What happened in your case?
Prioritizing goals on the basis of per-
ceived evaluative utility means that an
The overwhelming consensus was that, at
evaluation might focus on goals of appar-
the very least, central goals ought to be
ent peripheral importance rather than
evaluated and, where possible, both central
more central program goals. This is a mat- and peripheral goals should be studied. As
ter of some controversy. In her early work, they elaborated, nine decision makers and
Weiss (1972b) offered the following advice eight evaluators said that utilization had
to evaluators: probably been increased by concentrating
on central issues. This phrase reflects an
important shift in emphasis. As they elabo-
The evaluator will have to press to find out rated their answers about evaluating cen-
priorities—which goals the staff sees as criti- tral versus peripheral goals, they switched
cal to its mission and which are subsidiary. from talking about goals to talking about
But since the evaluator is not a mere techni- "issues." Utilization is increased by focus-
cian for the translation of a program's stated ing on central issues. And what is a central
aims into measurement instruments, he has issue? It is an evaluation question that
a responsibility to express his own interpre- someone really cares about. The subtle dis-
tation of the relative importance of goals. He tinction here is critical. Evaluations are use-
doesn't want to do an elaborate study on the ful to decision makers if they focus on
attainment of minor and innocuous goals, central issues—which may or may not in-
while some vital goals go unexplored, clude evaluating attainment of central
(pp. 30-31; emphasis added) goals.
The Personal Factor Revisited
Different people will have different per- the evaluator's job any easier. It does mean
ceptions of what constitutes central pro- that the personal factor remains the key to
gram goals or issues. Whether it is the evaluation use. The careful selection of
evaluator's opinion about centrality, the knowledgeable, committed, and informa-
funder's, some special interest group's per- tion-valuing people makes the difference.
spective, or the viewpoints of program staff The goals clarification game is most mean-
and participants, the question of what con- ingful when played by people who are
stitutes central program goals and objec- searching for information because it helps
tives remains an intrinsically subjective them focus on central issues without letting
one. It cannot be otherwise. The question the game become an end in itself or turning
of central versus peripheral goals cannot it into a contest between staff and evalua-
really be answered in the abstract. The tors.
question thus becomes: central from whose
point of view? The personal factor (Chap-
ter 3) intersects the goals clarification pro-
cess in a utilization-focused evaluation. In- The Goals Paradox
creasing use is largely a matter of matching:
getting information about the right ques- This chapter began with an evaluation
tions, issues, and goals to the right people. of Nasrudin's hunting trip in search of
Earlier in this chapter, I compared the bears. For Nasrudin, that trip ended with
goals clarification process to the party the "marvelous" outcome of seeing no
game of Twenty Questions. Research indi- bears. Our hunting trip in search of the role
cates that different individuals behave quite of goals in evaluation has no conclusive
differently in such a game (and, by exten- ending because the information needs of
sion, in any decision-making process). primary intended users will vary from
Worley (1960), for example, studied sub- evaluation to evaluation and situation to
jects' information-seeking endurance in the situation. Focusing an evaluation on pro-
game under experimental conditions. In- gram goals and objectives is clearly not the
itially, each subject was presented with a straightforward, logical exercise depicted
single clue and given the option of guessing by the classical evaluation literature be-
what object the experimenter had in mind cause decision making in the real world is
or of asking for another clue. This option not purely rational and logical. This is the
was available after each new clue, but a paradox of goals. They are rational abstrac-
wrong guess would end the game. Worley tions in nonrational systems. Statements of
found large and consistent individual dif- goals emerge at the interface between the
ferences in the amount of information play- ideals of human rationality and the reality
ers sought. Donald Taylor (1965) cites the of diverse human values and ways of think-
research of Worley and others as evidence ing. Therein lies their strength and their
that decision-making and problem-solving weakness. Goals provide direction for ac-
behavior is dynamic, highly variable, and tion and evaluation, but only for those who
contingent upon both situational and indi- share in the values expressed by the goals.
vidual characteristics. This does not make Evaluators live inside that paradox.
One way out of the paradox is to focus Note

the evaluation without making goal at-
tainment the central issue. The next chap- 1. This material, and related information in
ter considers alternatives to goals-based the chapter, has been adapted and used with
evaluation. permission from the Minnesota Department of
Human Services.
Focusing an Evaluation
Alternatives to Goals-Based Evaluation
M reative thinking may mean simply the realization that there's no particular vir-
^^m^^^^tue in doing things the way they always have been done.
—Rudolf Flesch
/ f you can see in any given situation only what everybody else can see, you can
î -* be said to be so much a representative of your culture that you are a victim of it.
—S. I. Hayakawa
More Than One Way to Manage a Horse
Here is a story about the young Alexander from Plutarch (1952):
There came a day when Philoneicus the Thessalian brought King Philip a horse named
Bucephalus, which he offered to sell for 13 talents. The king and his friends went down
tn the plain to watch the horse's trials and came to the conclusion that he was wild and
quite unmanageable, for he would allow no one to mount him, nor would he endure
the shouts of Philip's grooms, but reared up against anyone who approached. The king
became angry at being offered such a vicious unbroken animal and ordered it led away.
But Alexander, who was standing close by, remarked, "What a horse they are losing,
and all because they don't know how to handle him, or dare not try!"
177
King Philip kept quiet at first, but when he heard Alexander repeat these words and
saw that he was upset, he asked him: "Do you think you know more than your elders
or can manage a horse better?"
"I could manage this one better," retorted Alexander.
"And if you cannot," said his father, "what penalty will you pay for being so
impertinent?"
"I will pay the price of the horse," answered the boy. At this, the whole company
burst out laughing. As soon as the father and son had settled the terms of the bet,
Alexander went quickly up to Bucephalus, took off his bridle, and turned him towards
the sun, for he had noticed that the horse was shying at the sight of his own shadow, as
it fell in front of him and constantly moved whenever he did. He ran alongside the
animal for a little way, calming him down by stroking him, and then, when he saw he
was full of spirit and courage, he quietly threw aside his cloak and with a light spring
vaulted safely onto his back. For a little while, he kept feeling the bit with the reins,
without jarring or tearing his mouth, and got him collected. Finally, when he saw that
the horse was free of his fears and impatient to show his speed, he gave him his head
and urged him forward, using a commanding voice and touch of the foot.
King Philip held his breath in an agony of suspense until he saw Alexander reach the
end of his gallop, turn in full control, and ride back triumphant, exulting in his success.
Thereupon the rest of the company broke into loud applause, while his father, we am
told, actually wept for joy. When Alexander had dismounted, he kissed him and said:
"My boy, you must find a kingdom big enough for your ambitions. M.irrJunia is ton
small for you."
More Than One Way to

Focus an Evaluation
The last chapter focused on goals and out-
Young Alexander, later to be Alexander comes as traditional ways to focus an evalu-
the Great, showed that there was more than ation. A program with clear, specific, and
one way to manage a horse. What I like measurable goals is like a horse already
most about this story, as a metaphor for trained for riding. Programs with multiple,
managing an evaluation, is that he based his conflicting, and still developing or ever-
approach to the horse on careful observa- changing goals can feel wild and risky to an
tions of the horse and situation. He noticed evaluator whose only experience is with
that the horse was afraid of its shadow, so seasoned and trained horses. This chapter
he turned him toward the sun. He estab- will examine why goals-based evaluation
lished a relationship with the wild animal often doesn't work and offer alternatives
before mounting it. He was sensitive to for focusing an evaluation. Just as there's
the horse's response to the bit and reins. more than one way to manage a wild horse,
Alexander exemplified being active, reac- there's more than one way to manage
tive, and adaptive. evaluation of a seemingly chaotic program.
Focusing an Evaluation • 179
Problems With Goals-Based Evaluation
o ne can conduct useful evaluations without ever seeing an objective.
—Smith 1980:39
Alternatives to goals-based evaluation Another critique of goals is that they're

have emerged because of the problems often unreal. Since I've argued that evalu-
evaluators routinely experience in attempt- ation is grounded in reality testing, it be-
ing to focus on goals. In addition to fuzzy hooves us to examine the reality of goals.
goals and conflicts over goals—problems To "reify" is to treat an abstraction as if it
addressed in the previous chapter—mea- were real. Goals have been a special target
suring goal attainment can overpoliticize of social scientists concerned with concept
goals. In this regard, Lee J. Cronbach and reification. For example, Cyert and M a r c h
associates (1980) at the Stanford Evalu- (1963:28) have asserted that individual
ation Consortium have warned about the people have goals, collectivities of people
distortions that result when program staff do not. They likewise asserTthat only indi-
pay too much attention to what an evalu- viduals can act; organizations or programs,
ator decides to measure, essentially giving as such, cannot be said to take action. T h e
the evaluator the power to determine what future state desired by an organization (its
activities become primary in a program. goals) is nothing but a function of individ-
ual "aspirations." „ / . . . • ''• -'
It is unwise for evaluation to focus on Azumi and Hage (1972) reviewed the
whether a project has "attained its goals." debate about whether organizations have
Goals are a necessary part of political rheto- goals and concluded, "Organizational soci-
ric, but all social programs, even supposedly ologists have found it useful to assume that
targeted ones, have broad aims. Legislators organizations are p u r p o s i v e . . . . However,
who have sophisticated reasons for keeping it has been much more difficult to actually
goal statements lofty and nebulous unblush- measure the goals of an organization. Re-
ingly ask program administrators to state searchers find the purposive image helpful
explicit goals. Unfortunately, whatever the but somehow elusive" (p. 414).
evaluator decides to measure tends to be- In brief, social scientists who study goals
come a primary goal of program operators, are not quite sure what they are studying.
(p. 5) Goals analysis as a field of study is com-
plex, chaotic, controversial, and confusing.
In other w o r d s , w h a t gets measured In the end, most researchers follow the
gets d o n e . An example is w h e n teachers pragmatic logic of organizational sociolo-
focus on w h e t h e r students can pass a read- gist Charles Perrow (1970):
ing test rather than on whether they learn
to read. T h e result can be students w h o For our purposes we shall use the concept of
pass m a n d a t e d competency tests but are an organizational goal as if there were no
still functionally illiterate. question concerning its legitimacy, even
though we recognize that there are legitimate clarity and stability of goals are contingent
objections to doing so. Our present state of on the organization's environment. Emery
conceptual development, linguistic prac- and Trist (1965) identified four types of
tices, and ontology (knowing whether organizational environments characterized
something exists or not) offers us no alterna- by varying degrees of uncertainty facing the
tive, (p. 134) organization. Uncertainty includes things
like funding stability, changes in rules and
Like Perrow, evaluators are likely to regulations, mobility and transience of cli-
come down on the side of practicality. ents and suppliers, and political, economic,
The language of goals will continue to or social turbulence. What is important
dominate evaluation. By introducing the about their work from an evaluation per-
issue of goals reification, I have hoped spective is the finding that the degree of
merely to induce a modicum of caution uncertainty facing an organization directly
and compassion among evaluators before affects the degree to which goals and strate-
they impose goals clarification exercises gies for attaining goals can be made con-
on program staff. Given the way organiza- crete and stable. The less certain the envi-
tional sociologists have gotten themselves ronment, the less stable and less concrete
tangled up in the question of whether the organization's goals will be. Effective
program-level goals actually exist, it is just organizations in turbulent environments
possible that difficulties in clarifying a adapt their goals to changing demands and
program's goals may be due to problems conditions.
•r.- inherent in the notion of goals rather than
In practical terms, this means that the
staff incompetence, intransigence, or op-
more unstable and turbulent the environ-
position to evaluation. Failure to appreci-
ment of a program, the less likelv it is that
ate these difficulties and proceed with
the evaluator will be able to generate con-
sensitivity and patience can create staff
crete and stable goals. Second, few evalu-
resistance that is detrimental to the entire
ations can investigate and assess all the
evaluation process.
many programmatic components and spe-
I have also hoped that reviewing the cial projects of an agency, organization, or
conceptual and operational problems with program. The clarity, specificity, and mea-
goals would illuminate why utilization- surability of goals will vary throughout a
focused evaluation does not depend on program, depending on the environmental
clear, specific, and measurable objectives as turbulence faced by specific projects and
the sine qua non of evaluation research. program subparts. As an evaluator works
Clarifying goals is neither necessary nor with primary intended users to focus the
appropriate in every evaluation. evaluation, the degree to which it is useful
to labor over writing a goals statement will
Turbulent Environments vary for different parts of the program. It
and Goals will not be efficient or useful to force de-
veloping and adapting programs into a
The extent to which evaluators should static and rigid goals model. Developmen-
seek clarity about goals will depend, among tal evaluation, discussed in Chapter 5, is
other things, on the program's develop- one way of being a useful form of evalu-
mental status and environment. Organiza- ation in innovative settings where goals are
tional sociologists have discovered that the emergent and changing rather than prede-
termined and fixed. Another alternative is It seemed to me, in short, that consideration
goal-free evaluation. and evaluation of goals was an unnecessary
but also a possibly contaminating step. I be-
gan work on an alternative approach—
simply the emulation of actual effects against
Goal-Free Evaluation
a profile of demonstrated needs. I call this
Goal-Free Evaluation. . . .
Philosopher-evaluator Michael Scriven
The less the external evaluator hears
has been a strong critic of goals-based
about the goals of the project, the less tunnel
evaluation and, as an alternative, an advo-
vision will develop, the more attention will
cate of what he has called goal-free evalu-
be paid to looking for actual effects (rather
ation. Goal-free evaluation involves gath-
than checking on alleged effects), (p. 2; em-
ering data on a broad array of actual effects
phasis in original)
and evaluating the importance of these ef-
fects in meeting demonstrated needs. The
Scriven (1972b) distrusted the grandi-
evaluator makes a deliberate attempt to
ose goals of most projects. Such great and
avoid all rhetoric related to program goals.
grandiose proposals "assume that a gal-
N o discussion about goals is held with staff,
lant try at Everest will be perceived m o r e
and n o program brochures or proposals are
favorably than successful m o u n t i n g of
read; only the program's actual outcomes
molehills. T h a t may or may not be so, but
and measurable effects are studied, and
it's an unnecessary noise source for the
these are judged on the extent to which
evaluator" (p. 3). H e saw n o reason to get
they meet demonstrated participant needs.
caught u p in distinguishing alleged goals
Scriven (1972b) has offered four rea-
from real goals: " W h y should the evalu-
sons for doing goal-free/needs-based
ator get into the messy job of trying to
evaluation:
disentangle that k n o t ? " H e w o u l d also
avoid goals conflict and goals war: " W h y
1. To avoid the risk of narrowly studying stated try to decide which goal should super-
program objectives and thereby missing im- vene?" H e even countered the goals clari-
portant unanticipated outcomes fication shuffle:
2. To remove the negative connotations at-
tached to the discovery of unanticipated Since almost all projects either fall short of
effects, because "the whole language of their goals or overachieve them, why waste
'side-effect' or 'secondary effect' or even time rating the goals, which usually aren't
'unanticipated effect' tended to be a put- what is achieved? Goal-free evaluation is
down of what might well be the crucial unaffected by—and hence does not legislate
achievement, especially in terms of new against—the shifting of goals midway in a
priorities" (pp. 1-2) project.
3. To eliminate the perceptual biases intro-
duced into an evaluation by knowledge of Scriven (1991b) also dealt with the fuzzi-
goals ness problem: "Goals are often stated so
4. To maintain evaluator objectivity and inde- vaguely as t o cover both desirable and un-
pendence through goal-free conditions desirable activities, by almost anyone's
standards. Why try to find out what was
In Scriven's (1972b) own words: really intended—if anything?" Finally, he
has argued that "if the program is achieving Another error is to think that all standards of
its stated goals and objectives, then these merit are arbitrary or subjective. There's
will show u p " in the goal-free interviews nothing subjective about the claim that we
with and observations of program partici- need a cure for cancer more than a new
pants done to determine actual impacts brand of soap. The fact that some people
(p. 180). have the opposite preference (if true) doesn't
Sometimes the result of goal-free evalu- even weakly undermine the claim about
ation is a statement of goals; that is, rather which of these alternatives the nation needs
than being the initial focus of the evalu- most. So the Goal-Free Evaluation may use
ation process, a statement of operating needs and not goals, or the goals of the
goals becomes its outcome. Scriven, how- consumer or the funding agency. Which of
ever, considers this inappropriate: these is appropriate depends on the case. But
in no case is it proper to use anyone's as the
standard unless they can be shown to be the
It often happens in goal-free evaluation that appropriate ones and morally defensible,
people use this as a way of working out what (pp. 3-4)
the goals are, but I discourage them from
trying to do that. That's not the point of it. As a philosopher, Scriven may feel
The outcome is an assessment of the merit of comfortable specifying w h a t "the nation
the program. needs" and designating standards as "mor-
A better way to put the trouble with the ally defensible." But from a utilization-
name goal-free is to say that you might put focused perspective, this simply begs the
it better by saying it is needs-based instead of question of w h o is served by the informa-
goal-based. It is based on something, namely tion collected. T h e issue is n o t which goals
the needs of the client or recipient, but it isn't are better or worse, moral or i m m o r a l ,
based on the goals of the program people and appropriate or inappropriate, in any o b -
you never need to know those and you jective sense. T h e issue is whose goals will
shouldn't ever look at them. As far as the idea be evaluated. Scriven's goal-free m o d e l
that you finally come up with them as a eliminates only one g r o u p from the game:
conclusion, you'd be surprised the extent to local project staff. H e directs data in only
which you don't. (Scriven and Patton 1976: one clear direction—away from the stated
13-14; emphasis added) concerns of the people w h o r u n the p r o -
gram. H e addresses an external audience,
Some critics of Scriven have countered such as legislative funders. But, inas-
that goal-free evaluation only appears to much as these audiences are ill defined
get rid of goals. T h e only goals really and lack organization, I am unconvinced
eliminated are those of local project staff. that the standards he applies are n o n e
Scriven replaces staff objectives with m o r e other than his very o w n preferences about
global goals based on societal needs a n d what program effects are appropriate a n d
basic standards of morality. Under a goal- morally defensible. Scriven's denial not-
free a p p r o a c h , only the evaluator knows withstanding (cf. Scriven 1 9 7 2 b : 3 ) , goal-
for sure w h a t those needs a n d standards free evaluation carries the danger of sub-
are, although Scriven (1972b) considers stituting the evaluator's goals for those of
such standards to be as obvious as the the project. M a r v Alkin (1972) has m a d e
difference between soap and cancer: essentially the same point:
This term "Goal-Free Evaluation" is not to one part of a comprehensive evaluation

be taken literally. The Goal-Free Evaluation includes a goal-free evaluator w o r k i n g
does recognize goals (and not just idiosyn- parallel to a goals-based evaluator. This
cratic ones), but they are to be wider context solves the potential problem that, if
goals rather than the specific objectives of a evaluators need not k n o w w h e r e a p r o -
program. . . . By "goal-free" Scriven simply gram is headed to evaluate w h e r e it ends
means that the evaluator is free to choose a u p , then program staff might embrace this
wide context of goals. By his description, he logic and, likewise, decide t o eschew
implies that a goal-free evaluation is always goals. Under a pure goal-free a p p r o a c h ,
free of the goals of the specific program and program staff need only wait until the
sometimes free of the goals of the program goal-free evaluator determines w h a t the
sponsor. In reality, then, goal-free evaluation p r o g r a m has accomplished and then p r o -
is not really goal-free at all, but is simply claim those accomplishments as their
directed at a different and usually wide deci- original goals. Ken M c l n t y r e (1976) has
sion audience. The typical goal-free eval- described eloquently just such an ap-
uator must surely think (especially if he re- proach to evaluation in a p o e m addressed
jects the goals of the sponsoring agency) that to program staff.
his evaluation will extend at least to the level
of "national policy formulators." The ques- Your program's goals you need a way of
tion is whether this decision audience is of knowing.
the highest priority, (p. 11) You're sure you've just about arrived,
But where have you been going?
It should be n o t e d that Scriven's goal-
So, like the guy who fired his rifle at a
free proposal assumes b o t h internal and
10-foot curtain
external evaluators. T h u s , part of the rea-
And drew a ring around the hole to make a
son the external evaluators can ignore
bull's-eye certain,
p r o g r a m staff and local project goals is
because the internal evaluator takes care It's best to wait until you're through
of all that. T h u s , again, goal-free evalu- And then see where you are:
ation is only partially goal-free. Someone Deciding goals before you start is riskier by
has to stay h o m e and mind the goals while far.
the external evaluators search for any and So, if you follow my advice in your
all effects. As Scriven (1972b) has argued, evaluation,
You'll start with certainty
Planning and production require goals, and And end with self-congratulation, (p. 39)
formulating them in testable terms is abso-
lutely necessary for the manager as well as There have been several serious cri-
the internal evaluator who keeps the man- tiques of goal-free evaluation (see Alkin
ager informed. That has nothing to do with 1972; Kneller 1 9 7 2 ; P o p h a m 1 9 7 2 ;
the question of whether the external evalu- Stufflebeam 1972), much of it focused on
ator needs or should be given any account of the label as much as the substance.
the project's goals, (p. 4) Scriven's critique of goals-based evalu-
ation, however, is useful in affirming w h y
In later reflections, Scriven (1991b: evaluators need more than one way of
181) p r o p o s e d "hybrid forms" in which focusing an evaluation.
feyopa
Evaluation will not be well served by The utilization-focused evaluation issue is

dividing people into opposing camps: pro- what information is needed by primary
goals versus anti-goals evaluators. I am re- intended users, not whether goals are clear,
minded of an incident at the University of specific, and measurable. Let's consider,
Wisconsin during the student protests over then, some other alternatives to goals-
the Vietnam War. Those opposed to the based evaluation.
war were often labeled communists. At one
demonstration, both anti-war and pro-
war demonstrators got into a scuffle, so A Menu Approach to
police began making arrests indiscrimi- Focusing Evaluations
nately. When one of the pro-war demon-
strators was apprehended, he began yell- Menu 8.1 at the end of this chapter
ing, "You've got the wrong person. I'm offers an extensive list of alternative ways
anti-communistl" To which the police offi- of focusing an evaluation. I'll elaborate on
cer replied, "I don't care what kind of only a few of these here.
communist you are, you're going to jail."
Well, I don't care what kind of evaluator Focusing on future decisions. An evalu-
you are, to be effective you need the flexi- ation can be focused on information
bility to evaluate with or without goals. needed to inform future decisions. Propo-
nents and opponents of school busing for 57). The focus, then, is on informing each
desegregation may never agree on educa- group of the perspective of other groups.
tional goals, but they may well agree on
what information is needed to inform fu- Focusing on questions. In Chapter 2,
ture debate, for example, data about who I described focusing an evaluation in Can-
is bused, at what distances, from what ada by having primary intended users
neighborhoods, and with what effects. generate questions that they wanted an-
swered—without regard to methods, mea-
Focusing on critical issues or concerns. surement, design, resources, precision—
When the Minnesota Legislature first in- just 10 basic questions, real questions
itiated Early Childhood Family Education that they considered important.
programs, some legislators were con- After working individually and in small
cerned about what instruction and advice groups, we pulled back together and gen-
were being given to parents. The evalu- erated a single list of 10 basic evaluation
ation focused on this issue, and the evalua- questions—answers to which, they agreed,
tors became the eyes and ears for the could make a real difference to the opera-
Legislature and general public at a time of tions of the school division. The questions
conflict about "family values" and anxiety were phrased in their terms, incorporating
about values indoctrination. The evalu- important local nuances of meaning and
ation, based on descriptions of what actu- circumstance. Most important, they had
ally occurred and data on parent reac- discovered that they had questions they
tions, helped put this issue to rest. Now, cared about—not my questions but their
20 years later, the latest evaluation of this questions, because during the course of the
program (Mueller 1996) has focused on exercise it had become their evaluation.
the issue of universal access. Should the Generating a list of real and meaning-
program be targeted to low-income par- ful evaluation questions played a critical
ents or continue to be available to all part in getting things started. Exhibit 2.4
parents, regardless of income? What are in Chapter 2 offers criteria for good
the effects on parents of a program that utilization-focused questions.
integrates people of different socioeco- It is worth noting that formulating an
nomic backgrounds? And, as before, this appropriate and meaningful question in-
issue has been raised in the Legislature. volves considerable skill and insight. In her
Both these evaluations, then, were issue- novel, The Left Hand of Darkness, science
based more than goals-based, although fiction author Ursula K. Le Guin (1969)
attention to differential parent outcomes reminds us that questions and answers are
was subsumed within the issues. precious resources, not to be squandered or
treated casually. She shows us that how one
The "responsive approach" to evalu- poses a question frames the answer one
ation. Stake (1975) advocates incorporat- gets—and its utility. In the novel, the char-
ing into an evaluation the various points acter Herbor makes an arduous journey to
of view of constituency groups under the fortune tellers who convene rarely and,
assumption that "each of the groups asso- when they do, permit the asking of only a
ciated with a program understands and single question. His mate is obsessed with
experiences it differently and has a valid death, so Herbor asks them how long his
perspective" (Stecherand Davis 1987:56- mate will live. Herbor returns home to tell
his mate the answer, that Herbor will die lar intervals to "kick around" evaluation
before his mate. His mate is enraged: "You ideas. Everyone was free to make sugges-
fool! You had a question of the Foretellers, tions. Said the director, "If the committee
and did not ask them when I am to die, thought a suggestion was worthwhile, we
what day, month, year, how many days are would usually give the person that sug-
left to me—you asked how long? Oh you gested it an opportunity to work it up in a
fool, you staring fool, longer than you, yes, little more detail" [DM159:3]. The pro-
longer than you!" And with that his mate gram officer commented that the final re-
struck him with a great stone and killed port looked systematic and goals-based, but
him, fulfilling the prophecy and driving the
mate into madness, (pp. 45-46)
that's not the kind of thinking we were actu-
ally doing at that time . . . We got started by
A "Seat-of-the-Pants" Approach brainstorming: "Well, we can look at the
funding formula and evaluate it." And some-
In our follow-up study of how federal one said, "Well, we can also see what state
health evaluations were used, we came agencies are doing." See? And it was this kind
across a case example of using issues and of seat-of-the-pants approach. That's the
questions to focus an evaluation. The deci- way we got into it. [P0159:4]
sion makers in that process, for lack of a
better term, called how they designed the The evaluation committee members
evaluation a "seat-of-the-pants" approach. were carefully selected on the basis of
I would call it focusing on critical issues. their knowledge of central program is-
The results influenced major decisions sues. While this was essentially an internal
about the national Hill-Burton Hospital evaluation, the committee also made use
Construction Program. This evaluation il- of outside experts. The director reported
lustrates some key characteristics of utiliza- that the committee was the key to the
tion-focused evaluation. evaluation's use: "I think the makeup of
The evaluation was mandated in federal the committee was such that it helped this
legislation. The director of the national study command quite a lot of attention
Hill-Burton program established a perma- from the state agencies and among the
nent committee on evaluation to make de- federal people concerned" [DM159:18].
cisions about how to spend evaluation Here, then, we have a case example of
funds. The committee included repre- the first two steps in utilization-focused
sentatives from various branches and ser- evaluation: (1) identifying and organizing
vices in the division: people from the state primary intended users of the evaluation
Hill-Burton agencies, from the Compre- and (2) focusing the evaluation on their
hensive Health Planning agencies, from the interests and what they believe will be use-
health care industry, and regional Hill- ful. And how do you keep a group like this
Burton people. The committee met at regu- working together?
Director: Well, I think this was heavily focused toward the major aspects of
the program that the group was concerned about.
Interviewer: Did the fact that you focused on major aspects of the program make
a difference in how the study was used?
Decision maker: It made a difference in the interest with which it was viewed by
people. . . . I think if we hadn't done that, if the committee hadn't
been told to go ahead and proceed in that order, and given the
freedom to do that, the committee itself would have lost interest.
The fact that they felt that they were going to be allowed to pretty
well free-wheel and probe into the most important things as they
saw them, I think that had a lot to do with the enthusiasm with
which they approached the task. [DM159:22]
The primary intended users began by were not limited to, goal attainment. They
brainstorming issues ("seat-of-the-pants negotiated back and forth—acting, react-
approach") but eventually framed the ing, adapting—until they determined and
evaluation question in the context of ma- agreed on the most relevant focus for the
jor policy concerns that included, but evaluation.
Changing Focus Over Time: Stage Models of Evaluation
valuate no program until it is proud.

—Donald Campbell (1983)
Important to focusing an evaluation can feedback to staff; (4) the "progress toward
be matching the evaluation to the pro- objectives" tier, focused on immediate,
gram's stage of development, what Tripodi, short-term outcomes and differential effec-
Felin, and Epstein (1971) called differential tiveness among clients; and (5) the "pro-
evaluation. Evaluation priorities can vary gram impact" tier, which focuses on overall
at the initiation stage (when resources are judgments of effectiveness, knowledge
being sought), the contact stage (when the about what works, and model specification
program is just getting under way), and the for replication.
full implementation stage. The logic of these stage models of evalu-
In a similar vein, Jacobs (1988) has con- ation is that, not only do the questions
ceptualized a "five-tier" approach: (1) the evolve as a program develops, but the
preimplementation tier focused on needs stakes go up. When a program begins, all
assessment and design issues; (2) the ac- kinds of things can go wrong, and, as we'll
countability tier to document basic func- see in the next chapter on implementation
tioning to funders; (3) the program clarifi- evaluation, all kinds of things typically do
cation tier focused on improvement and go wrong. It is rare that a program unfolds
as planned. Before committing major re- who were just beginning course develop-
sources to overall effectiveness evaluation, ment (so they were at the initiation or
then, a stage model begins by making sure preimplementation stage, tier one) to ar-
the groundwork was carefully laid during ticulate clear, specific, and measurable
the needs assessment phase; then basic im- goals in behavioral terms. The staff had no
plementation issues are examined and for- previous experience writing behavioral ob-
mative evaluation for improvement be- jectives, nor was program conceptualiza-
comes the focus; if the early results are tion sufficiently advanced to concretize
promising, then and only then, are the goals, so the evaluator formulated the ob-
stakes raised by conducting rigorous jectives for the evaluation.
summative evaluation. It was to this kind To the evaluator, the program seemed
of staging of evaluation that Donald chaotic. How can a program operate if it
Campbell (1983), one of the most distin- doesn't know where it's going? How can it
guished social scientists of the twentieth be evaluated if there are no operational
century, was referring when he implored objectives? His first-year evaluation ren-
that no program should be evaluated be- dered a negative judgment with special em-
fore it is "proud." Only when program staff phasis on what he perceived as the staff's
have reached a point where they and others failure to seriously attend to the behavioral
close to the program believe that they're on objectives he had formulated. The teaching
to something, "something special that we staff reacted by dismissing the evaluation as
know works here and we think others irrelevant. State education officials were
ought to borrow," should rigorous summa- also disappointed because they understood
tive evaluation be done to assess the pro- the problems of first-year programs and
gram's overall merit and worth (Schorr found the evaluation flawed in failing to
1988:269-70). help staff deal with those problems. The
An example may help clarify why it's so program staff refused to work with the
important to take into account a program's same evaluator the second year and faced
stage of development. The Minnesota State the prospect of a new evaluator with suspi-
Department of Education funded a "hu- cion and hostility.
man liberation" course in the Minneapolis When a colleague and I became involved
public schools aimed at enhancing commu- the second year, staff made it clear that they
nication skills around issues of sexism and wanted nothing to do with behavioral ob-
racism. Funding was guaranteed for three jectives. The funders and school officials
years, but a renewal application with evalu- agreed to a formative evaluation with staff
ation findings had to be filed each year. To as primary users. The evaluation focused
ensure rigorous evaluation, an external, on the staff's need for information to in-
out-of-state evaluator was hired. When the form ongoing, adaptive decisions aimed at
evaluator arrived on the scene, virtually program development and improvement.
everything about the program was uncer- This meant confidential interviews with
tain: curriculum content, student reaction, students about strengths and weaknesses of
staffing, funding, relationship to the school the course, observations of classes to de-
system, and parent support. The evaluator scribe interracial dynamics and student re-
insisted on beginning at what Jacobs (198 8) actions, and beginning work on measures
called the fourth of five tiers: assessing of racism and sexism. On this latter
progress toward objectives. He forced staff, point, program staff were undecided as to
whether they were really trying to change what will be evaluated means deciding
student attitudes and behaviors or just what will not be evaluated. Programs are so
make students more "aware." They needed complex and have so many levels, goals,
time and feedback to work out satisfactory and functions that there are always more
approaches to the problems of racism and potential study foci than there are re-
sexism. sources to examine them. Moreover, as
By the third year, uncertainties about human beings, we have a limited capacity
student reaction and school system support to take in data and juggle complexities. We
had been reduced by the evaluation. Initial can deal effectively with only so much at
findings indicated support for the program. one time. The alternatives have to be nar-
Staff had become more confident and ex- rowed and decisions made about which
perienced. They decided to focus on instru- way to go. That's why I've emphasized the
ments to measure student changes. They menu metaphor throughout this book. The
were ready to deal with program outcomes utilization-focused evaluation facilitator is
as long as they were viewed as experimen- a chef offering a rich variety of choices,
tal and flexible. from full seven-course feasts to fast-food
The results of the third-year evaluation preparations (but never junk). The stage
showed that students' attitudes became approach to evaluation involves figuring
more racist and sexist because the course out whether, in the life of the program, it's
experience inadvertently reinforced stu- time for breakfast, lunch, a snack, a light
dents' prejudices and stereotypes. Because dinner, or a full banquet.
they helped design and administer the tests This problem of focus is by no means
used, teachers accepted the negative find- unique to program evaluation. Manage-
ings. They abandoned the existing curricu- ment consultants find that a major problem
lum and initiated a whole new approach to for executives is focusing their energies on
dealing with the issues involved. By work- priorities. The trick in meditation is learn-
ing back and forth between specific infor- ing to focus on a single mantra, koan, or
mation needs, contextual goals, and fo- image. Professors have trouble getting
cused evaluation questions, it was possible graduate students to analyze less than the
to conduct an evaluation that was used to whole of human experience in their disser-
improve the program in the second year tations. Time-management specialists find
and make an overall decision about effec- that people have trouble setting and stick-
tiveness at the end of the third year. The ing with priorities in both their work and
key to use was matching the evaluation to personal lives. And evaluators have trouble
the program's stage of development and getting intended users to focus evaluation
the information needs of designated users issues.
as those needs changed over time. Focusing an evaluation means dealing
with several basic concerns. What is the
purpose of the evaluation? How will the
Focusing an Evaluation information be used? What will we know
after the evaluation that we don't know
Focusing an evaluation is an interactive now? What actions will we be able to take
process between evaluators and the pri- based on evaluation findings? These are not
mary intended users of the evaluation. It simply rote questions answered once and
can be a difficult process because deciding then put aside. The utilization-focused
evaluator keeps these questions front and Now, with both arms still outstretched I
center throughout the design process. The want you to focus, with the same intensity
answers to these and related questions will that you've been using on each hand, I want
determine everything else that happens in you to focus on the center of both palms at
the evaluation. As evaluators and primary the same time. (Pause while they try.) Unless
users interact a r o u n d these questions, the you have quite unusual vision, you're not
evaluation takes shape. able to do that. There are some animals who
The challenge is to find those "vital few" can move their eyes independently of each
facts among the "trivial m a n y " that are high other, but humans do not have that capabil-
in payoff and information load (MacKenzie ity. We can look back and forth between the
1972). T h e 20-80 rule expresses the impor- two hands, or we can use peripheral vision
tance of focusing on the right information. and glance at both hands at the same time,
The 20-80 rule states that, in general, 2 0 % but we can't focus intensely on the center of
of the facts account for 8 0 % of what's both palms simultaneously.
worth knowing (Anderson 1980:26). Focusing involves a choice. The decision
In working with intended users to un- to look at something is also a decision not to
derstand the importance of focus, I often look at something. A decision to see some-
d o a short exercise. It goes like this: thing means that something else will not be
seen, at least not with the same acuity. Look-
ing at your left hand or looking at your right
Let me ask you to put your right hand out in hand or looking more generally at both
front of you with your arm fully extended hands provides you with different informa-
and the palm of your hand open. Now, focus tion and different experiences.
on the center of the palm of your hand. The same principle applies to evaluation.
Really look at your hand in a way that you Because of limited time and limited re-
haven't looked at it in a long time. Study the sources, it is never possible to look at every-
lines—some of them long, some short; some thing in great depth. Decisions have to be
of them deep, some shallow; some relatively made about what's worth looking at. Choos-
straight, some nicely curved, and some of ing to look at one area in depth is also a
them quite jagged and crooked. Be aware decision not to look at something else in
of the colors in your hand: reds, yellows, depth. Utilization-focused evaluation sug-
browns, greens, blues, different shades and gests that the criterion for making those
hues. And notice the textures, hills and val- choices of focus be the likely utility of the
leys, rough places and smooth. Become resulting information. Findings that would
aware of the feelings in your hand, feel- be of greatest use for program improvement
ings of warmth or cold, perhaps tingling and decision making focus the evaluation.
sensations.
Now, keeping your right hand in front of
you, extend your left arm and look at your A Cautionary Note
left palm in the same way, not comparatively, and Conclusion
but just focus on the center of your left palm,
studying it, seeing it, feeling it. . . . Really Making use the focus of evaluation de-
allow your attention to become concentrated cision making enhances the likelihood of,
on the center of your left palm, getting to but does not guarantee, actual use. There
know your left hand in a new way. (Pause.) are no guarantees. All one can really do is
increase the likelihood of use. Utilization- in the midst of posing questions—then

focused evaluation is time consuming, fre- evaluation can be exhilarating, energizing,
quently frustrating, and occasionally ex- and fulfilling. The challenges yield to cre-
hausting. The process overflows with ativity, perseverance, and commitment as
options, ambiguities, and uncertainties. those involved engage in that most splendid
When things go wrong, as they often do, of human enterprises—the application of
you may find yourself asking a personal intellect and emotion to the search for an-
evaluation question: How did I ever get swers that will improve human effort and
myself into this craziness? activity. It seems a shame to waste all that
But when things go right; when decision intellect and emotion studying the wrong
makers care; when the evaluation question issues. That's why it's worth taking the time
is important, focused, and on target; when to carefully focus an evaluation for opti-
you begin to see programs changing even mum utility.
192 F O C U S I N G EVALUATIONS
c M F N U 8.1
H
Alternative Ways of Focusing Evaluations
Different types of evaluations ask different questions and focus on different purposes. This
menu is meant to be illustrative of the many alternatives available. These options by no means
exhaust all possibilities. Various options can be and often are used together within the same
evaluation, or options can be implemented in sequence over a period of time, for example,
doing implementation evaluation before doing outcomes evaluation, or formative evaluation
before summative evaluation.
Focus or Type of Evaluation Defining Question or Approach
Accreditation focus Does the program meet minimum standards for accreditation
or licensing?
Causal focus Use rigorous social science methods to determine the relationship
between the program (as a treatment) and resulting outcomes
Cluster evaluation Synthesizing overarching lessons and/or impacts from a number
of projects within a common initiative or framework
Collaborative approach Evaluators and intended users work together on the evaluation
Comparative focus How do two or more programs rank on specific indicators,
outcomes, or criteria?
Compliance focus Are rules and regulations being followed?
Connoisseurship approach Specialists or experts apply their own criteria and judgment, as
with a wine or antiques connoisseur
Context focus What is the environment within which the program operates
politically, socially, economically, culturally, and scientifically?
How does this context affect program effectiveness?
Cost-benefit analysis What is the relationship between program costs and program
outcomes (benefits) expressed in dollars?
Cost-effectiveness analysis What is the relationship between program costs and outcomes
(where outcomes are not measured in dollars)?
Criterion-focused By what criteria (e.g., quality, cost, client satisfaction) shall the
evaluation program be evaluated?
Critical issues focus Critical issues and concerns of primary intended users focus the
evaluation
Decisions focus What information is needed to inform specific future decisions?
Descriptive focus What happens in the program? (No "why" questions or cause/
effect analyses)
Developmental evaluation The evaluator is part of the program design team, working
together over the long term for ongoing program development
Diversity focus The evaluation gives voice to different perspectives on and
illuminates various experiences with the program. No single
conclusion or summary judgment is considered appropriate.
Effectiveness focus To what extent is the program effective in attaining its goals?
How can the program be more effective?
Efficiency focus Can inputs be reduced and still obtain the same level of output
or can greater output be obtained with no increase in inputs?
Effort focus What are the inputs into the program in terms of number of
personnel, staff/client ratios, and other descriptors of levels of
activity and effort in the program?
Empowerment The evaluation is conducted in a way that affirms participants'
evaluation self-determination and political agenda
Equity focus Are participants treated fairly and justly?
Ethnographic focus What is the program's culture?
Evaluability assessment Is the program ready for formal evaluation? What is the feasibility
of various evaluation approaches and methods?
Extensiveness focus To what extent is the program able to deal with the total problem?
How does the present level of services and impacts compare to the
needed level of services and impacts?
External evaluation The evaluation is conducted by specialists outside the program and
independent of it to increase credibility
Formative evaluation How can the program be improved?
Goal-free evaluation What are the actual effects of the program on clients (without
regard to what staff say they want to accomplish) ? To what extent
are real needs being met?
Goals-based focus To what extent have program goals been attained?
Impact focus What are the direct and indirect program impacts, not only on
participants, but also on larger systems and the community?
Implementation focus To what extent was the program implemented as designed? What
issues surfaced during implementation that need attention in the
future?
Inputs focus What resources (money, staff, facilities, technology, etc.) are available
and/or necessary?
Internal evaluation Program employees conduct the evaluation
Intervention-oriented Design the evaluation to support and reinforce the program's desired
evaluation results
Judgment focus Make an overall judgment about the program's merit or worth
(see also summative evaluation)
Knowledge focus What can be learned from this program's experiences and results to
(or Lessons Learned) inform future efforts?
Logical framework Specify goals, purposes, outputs, and activities, and connecting
assumptions; for each, specify indicators and means of verification
Longitudinal focus What happens to the program and to participants over time?
Meta-evaluation Was the evaluation well done? Is it worth using? Did the evaluation
meet professional standards and principles?
Mission focus To what extent is the program or organization achieving its overall
mission? How well do outcomes of departments or programs within
an agency support the overall mission?
Monitoring focus Routine data collected and analyzed routinely on an ongoing basis,
often through a management information system
Needs assessment What do clients need and how can those needs be met?
Needs-based evaluation See Goal-free evaluation
(continued)
194 FOCUSING EVALUATIONS
£ MENU 8.1 Continued

1
Focus or Type of Evaluation Defining Question or Approach
Norm-referenced How does this program population compare to some specific

approach norm or reference group on selected variables?
Outcomes evaluation To what extent are desired client/participant outcomes being
attained? What are the effects of the program on clients or
participants?
Participatory evaluation Intended users, usually including program participants and/or
staff, are directly involved in the evaluation
Personnel evaluation How effective are staff in carrying out their assigned tasks and
in accomplishing their assigned or negotiated goals?
Process focus What do participants experience in the program? What are strengths
and weaknesses of day-to-day operations? How can these processes
be improved?
Product evaluation What are the costs, benefits, and market for a specific product?
Quality assurance Are minimum and accepted standards of care being routinely and
systematically provided to patients and clients? How can quality
of care be monitored and demonstrated?
Questions focus What do primary intended users want to know that would make
a difference to what they do? The evaluation answers questions
instead of making judgments
Reputation focus How the program is perceived by key knowledgeables and
influentials; ratings of the quality of universities are often based
on reputation among peers
Responsive evaluation What are the various points of view of different constituency groups
and stakeholders? The responsive evaluator works to capture,
represent, and interpret these varying perspectives under the
assumption each is valid and valuable
Social and community What routine social and economic data should be monitored to
indicators assess the impacts of this program? What is the connection between
program outcomes and larger-scale social indicators, for example,
crime rates?
Social justice focus How effectively does the program address social justice concerns?
Summative evaluation Should the program be continued? If so, at what level? What is the
overall merit and worth of the program?
Theory-driven focus On what theoretical assumptions and model is the program based?
What social scientific theory is the program a test of and to what
extent does the program confirm the theory?
Theory of action What are the linkages and connections between inputs, activities,
approach immediate outcomes, intermediate outcomes, and ultimate impacts?
Utilization-focused What information is needed and wanted by primary intended users
evaluation that will actually be used for program improvement and decision
making? (Utilization-focused evaluation can include any of the other
types above.)
• * & • '
Implementation Evaluation:
What Happened in the Program?
j f your train's on the wrong track, every station you come to is the wrong
station.
—Bernard Malamud
An old story is told that through a series of serendipitous events, much too convoluted
and incredible to sort out here, four passengers found themselves together in a small
plane—a priest; a young, unemployed college dropout; the world's smartest person;
and the President of the United States. At 30,000 feet, the pilot suddenly announced
that the engines had stalled, the plane was crashing, and he was parachuting out. He
added as he jumped, "I advise you to jump too, but I'm afraid there are only three
parachutes left. . . . " With that dire news, he was gone.
The world's smartest person did the fastest thinking, grabbed a parachute, and
jumped. The President of the United States eyed the other two, put on a parachute, and
said as he jumped, "You understand, it's not for myself but for the country."
The priest looked immensely uneasy as he said, "Well, my son, you're young, and
after all I am a priest, and, well, it seems only the right thing to do, I mean, if you want,
um, just, um, go ahead, and um, well. . . . "
The college dropout smiled and handed the priest a parachute. "Not to worry,
Reverend. There's still a parachute for each of us. The world's smartest person grabbed
my backpack when he jumped."
195
Checking the Inventory finding out what actually is happening in

the program. Of what does the program
Programs, like airplanes, need all their consist? What are the program's key char-
parts to do what they're designed to do and acteristics? Who is participating? What do
accomplish what they're supposed to ac- staff do? What do participants experience?
complish. Programs, like airplanes, are sup- What's working and what's not working?
posed to be properly equipped to carry out What is the program? Menu 9.1 at the end
their assigned functions and guarantee pas- of this chapter provides additional imple-
senger (participant) safety. Programs, like mentation questions. (For a larger menu of
airplanes, are not always so equipped. over 300 implementation evaluation ques-
Regular, systematic evaluations of inven- tions, see King, Morris, and Fitz-Gibbon
tory and maintenance checks help avoid 1987:129-41.)
disasters in both airplanes and programs.
Implementation evaluation focuses on
finding out if the program has all its parts, An Exemplar
if the parts are functional, and if the pro-
gram is operating as it's supposed to be Our follow-up study of federal health
operating. Implementation evaluation can evaluations turned up one quite dramatic
be a major evaluation focus. It involves case of evaluation use with important im-
Implementation Evaluation • 197
plementation lessons. A state legislature When funds were allocated from the
established a program to teach welfare re- state to the city, the program immediately
cipients the basic rudiments of parenting became embroiled in the politics of urban
and household management. Under this welfare. Welfare rights organizations ques-
mandate, the state welfare department was tioned the right of government to tell poor
charged with conducting workshops, dis- people how to spend their money or rear
tributing brochures, showing films, and their children: "You have no right to tell us
training caseworkers on how low-income we have to run our houses like the white
people could better manage their meager middle-class parents. And who's this
resources and become better parents. A Frenchman Piaget who's going to tell us
single major city was selected for pilot- how to raise American kids?"
testing the program, with a respected in- These and other political battles delayed
dependent research institute contracted to program implementation. Procrastination
evaluate the program. Both the state legis- being the better part of valor, no parenting
lature and the state welfare department brochures were ever printed; no household
committed themselves publicly to using the management films were ever shown; no
evaluation findings for decision making. workshops were held; and no caseworkers
were ever hired or trained.
The evaluators interviewed a sample of
In short, the program was never imple-
welfare recipients before the program be-
mented. But it was evaluated! It was found
gan, collecting data about parenting,
to be ineffective—and was killed.
household management, and budgetary
practices. Eighteen months later, they in-
terviewed the same welfare recipients
The Importance of
again. The results showed no measurable
Implementation Analysis
change in parenting or household manage-
ment behavior. The evaluators judged the
It is important to know the extent to
program ineffective, a conclusion they re-
which a program attains intended out-
ported to the state legislature and the news-
comes and meets participant needs, but to
papers. Following legislative debate and
answer those questions it is essential to
adverse publicity, the legislature termi-
know what occurred in the program that
nated funding for the program—a dramatic
can reasonably be connected to outcomes.
case of using evaluation results to inform a The primer How to Assess Program Imple-
major decision. mentation (King et al. 1987) puts it this
Now suppose we want to know why the way:
program was ineffective. The evaluation as
conducted shed no light on what went To consider only questions of program out-
wrong because it focused entirely on mea- comes may limit the usefulness of an eval-
suring the attainment of intended program uation. Suppose the data suggest emphati-
outcomes: changed parenting and house- cally that the program was a success. You can
hold management behaviors of welfare re- say, "It worked!" But unless you have taken
cipients. As it turns out, there is a very care to describe the details of the program's
good reason why the program didn't at- operations, you may be unable to answer a
tain the desired outcomes. It was never question that logically follows such a judg-
implemented. ment of success: "What worked?" If you
cannot answer that, you will have wasted the consonant with that underlying decision.
effort measuring the outcomes of events that More and more, we are finding, the answer
cannot be described and therefore remain a is no.
mystery. . . . It is not just that the programs fall short
If this happens to you, you will not be of the early rhetoric that described them;
alone. As a matter of fact, you will be in good they often barely work at all. . . . Indeed, it
company. Few evaluation reports pay enough is possible that past analysis and research that
attention to describing the processes of a ignored implementation issues may have
program that helped participants achieve its asked the wrong questions, thereby producing
outcomes, (p. 9; emphasis in the original) information of little or no use to policy mak-
ing. (Williams and Elmore 1976:xi-xii; em-
Not knowing enough about implemen- phasis in the original)
tation limits the usefulness of findings
about effective programs and compounds The notion that asking the wrong ques-
misunderstandings about what is often tions will result in useless information is
called "the human services shortfall: the fundamental to utilization-focused evalu-
large and growing gap between what we ation. To avoid gathering useless informa-
expect from government-supported hu- tion about outcomes, it is important to
man service systems and what these sys- frame evaluation questions in the context
tems in fact deliver" (Lynn and Salasin of program implementation. Data on why
1974:4). The human services shortfall is this is critical come from many sources. At
made up of two parts: (1) failure of imple- the international level, studies collected
mented programs to attain desired out- and edited by John C. de Wilde (1967)
comes and (2) failure to actually imple- demonstrated that program implementa-
ment policy in the form of operating tion and administration were the critical
programs. In the early days of evaluation, problems in developing countries. Organ-
evaluators directed most of their atten- izational sociologists have documented
tion to the first problem by conducting the problems that routinely arise in imple-
outcomes evaluations. That practice be- menting programs that are new and inno-
gan to change in the face of evidence that vative alongside or as part of existing pro-
the second problem was equally, if not grams (e.g., Kanter 1983; Corwin 1973;
even more, critical. In a classic study of Hage and Aiken 1970). Researchers
social program implementation, Walter studying diffusion of innovations have
Williams concluded, "The lack of concern thoroughly documented the problems of
for implementation is currently the crucial implementing new ideas in new settings
impediment to improving complex oper- (e.g., Brown 1981; Havelock 1973; Ro-
ating programs, policy analysis, and gers and Shoemaker 1971; Rogers and
experimentation in social policy areas" Svenning 1969). Then there's the marvel-
(Williams and Elmore 1976:267; empha- ous case study of the Oakland Project by
sis in original). Pressman and Wildavsky (1984). Now a
classic on the trials and tribulations of
The fundamental implementation question implementation, this description of a
remains whether or not what has been de- Great Society urban development effort is
cided actually can be carried out in a manner entitled:
IMPLEMENTATION granted, even after large cash transfers have

How Great Expectations in taken place. Early evaluations of Title I
Washington Are Dashed in Oakland; programs in New York City provide an illus-
Or, Why It's Amazing That tration of this problem. (Guttentag and
Federal Programs Work at All, Struening 1975b:3-4)
This Being a Saga of the Economic
Development Administration as Told Terminating a policy inappropriately is
by Two Sympathetic Observers only one possible error when outcomes
Who Seek to Build Morals on a data are used without data about imple-
Foundation of Ruined Hopes mentation. Expanding a successful pro-
gram inappropriately is also possible
when decision makers lack information
Focus on Utility: Information about the basis for the program's success.
for Action and Decisions In one instance, a number of drug addic-
tion treatment centers in a county were
The problem with pure outcomes evalu- evaluated based on rates of readdiction
ation is that the results give decision makers for treated patients. All had relatively me-
little information to guide action. Simply diocre success rates except one program
learning that outcomes are high or low that reported a 100% success rate over
doesn't tell decision makers much about two years. The county board immediately
what to do. They also need to understand voted to triple the budget of that program.
the nature of the program. In the example Within a year, the readdiction rates for
that opened this chapter, legislators learned that program had fallen to the same me-
that targeted welfare parents showed no diocre level as other centers. By enlarging
behavioral changes, so they terminated the the program, the county board had elimi-
program. The evaluators failed to include nated the key elements in the program's
data on implementation that would have success—its small size and dedicated staff.
revealed the absence of any of the man- It had been a six-patient, halfway house
dated activities that were supposed to bring with one primary counselor who ate,
about the desired changes. By basing their slept, and lived that program. He estab-
decision only on outcomes information, lished such a close relationship with each
the legislators terminated a policy ap- addict that he knew exactly how to keep
proach that had never actually been tried. each one straight. When the program was
This was not a unique case. enlarged, he became administrator of
three houses and lost personal contact
Although it seems too obvious to mention, it with the clients. The successful program
is important to know whether a program became mediocre. A highly effective pro-
actually exists. Federal agencies are often in- gram was lost because the county board
clined to assume that, once a cash transfer acted without understanding the basis for
has taken place from a government agency the program's success.
to a program in the field, a program exists Renowned global investor and philan-
and can be evaluated. Experienced evalu- thropist George Soros tells a similar story.
ation researchers know that the very exis- Through a foundation he established in
tence of a program cannot be taken for Moscow when the Cold War thawed, he
funded a successful program aimed at being tried and to assess gaps in services.
transforming the education system. "I Before "the more sophisticated (and expen-
wanted to make it bigger, so I threw a lot sive) questions about effectiveness" were
of money at it—and in so doing, I destroyed asked, "policymakers wanted to know sim-
it, effectively. It was too much money" pler descriptive information.... If the cur-
(quoted by Buck 1995:76-77). rently funded programs could not even be
If, because of limited time and evalu- described, how could they be improved?"
ation resources, one had to choose between (Bickman 1985:190-91).
implementation evaluation and outcomes Unless one knows that a program is
measurement, there are instances in which operating according to design, there may
implementation assessment would be of be little reason to expect it to produce the
greater value. Decision makers can use im- desired outcomes. Furthermore, until the
plementation monitoring to make sure that program is implemented and a "treatment"
a policy is being put into operation accord- is believed to be in operation, there is lit-
ing to design or to test the very feasibility tle reason to evaluate outcomes. This is
of the policy. another variation on Donald Campbell's
For example, Leonard Bickman (1985) (1983) admonition to evaluate no program
has described a statewide evaluation of until it is proud, by which he meant that
early childhood interventions in Tennessee demanding summative outcomes evalu-
that began by asking stakeholders in state ation should await program claims and sup-
government what they wanted to know. porting evidence that something worth rig-
The evaluators were prepared to undertake orous evaluation is taking place.
impact studies, and they expected out-
comes data to be the evaluation priority.
However, interviews with stakeholders re- Ideal Program Plans and
vealed a surprising sophistication about the Actual Implementation
difficulties and expenses involved in get-
ting good, generalizable outcomes data in Why is implementation so difficult? Part
a timely fashion. Moreover, it was clear of the answer appears to lie with how
that key policymakers and program man- programs are legislated and planned. Poli-
agers "were more concerned about the cymakers seldom seem to analyze the fea-
allocation and distribution of resources sibility of implementing their ideas during
than about the effectiveness of projects" decision making (W Williams 1976:270).
(p. 190). They wanted to know whether This ends up making the task of evaluation
every needy child was being served. What all the more difficult because implementa-
services were being delivered to whom? tion is seldom clearly conceptualized. As a
State agencies could use this kind of imple- result, either as part of evaluability assess-
mentation and service delivery information ment or in early interactions with primary
to "redistribute their resources to unserved intended users, the evaluator will often
areas and populations or encourage differ- have to facilitate discussion of what the
ent types of services" (p. 191). They could program should look like before it can be
also use descriptive information about pro- said to be fully implemented and opera-
grams to increase communications among tional. Criteria for evaluating implementa-
service providers about what ideas were tion will have to be developed.
Implementation evaluation is further Barriers to Implementation

complicated by the finding that programs
are rarely implemented by single-mindedly Understanding some of the well-docu-
adopting a set of means to achieve prede- mented barriers to implementation can
termined ends. The process simply isn't help evaluators ask appropriate questions
that rational or logical. More common is and generate useful information for pro-
some degree of incremental implementa- gram adaptation and improvement. For
tion in which a program takes shape slowly example, organizational conflict and dis-
and adaptively in response to the emerging equilibrium often increase dramatically
situation and early experiences. For exam- during the implementation stage of organi-
ple, Jerome Murphy (1976:96) found, in zational change. No matter how much
studying implementation of Title V of the planning takes place, "people problems"
Elementary and Secondary Education Act, will arise.
that states exhibited great variation in im-
plementation. He found no basis for the The human element is seldom adequately
widespread assumption that competently considered in the implementation of a new
led bureaucracies would operate like goal- product or service. There will be mistakes
directed, unitary decision makers. Instead, that will have to be corrected. . . . In addi-
implementers at the field level did what tion, as programs take shape power struggles
made sense to them rather then simply develop. The stage of implementation is thus
following mandates from higher up; more- the stage of conflict, especially over power.
over, the processes of implementation were . . . Tempers flare, interpersonal animosities
more political and situational than rational develop, and the power structure is shaken.
and logical. (Hage and Aiken 1970:100, 104)
Sociologists who study formal organiza-
tions, social change, and diffusion of inno- Odiorne (1984:190-94) dissected "the
vations have carefully documented the sub- anatomy of poor performance" in manag-
stantial slippage in organizations between ing change and found gargantuan human
plans and actual operations. Design, imple- obstacles including staff who give up
mentation, and routinization are stages of when they encounter trivial obstacles,
development during which original ideas people who hang onto obsolete ideas and
are changed in the face of what's actually outmoded ways of doing things, emo-
possible (Kanter 1983; Hage and Aiken tional outbursts when asked to perform
1970; Mann and Neff 1961; Smelser new tasks, muddled communications,
1959). Even where planning includes a trial poor anticipation of problems, and de-
period, what gets finally adopted typically layed action when problems arise so that
varies from what was tried out in the pilot once manageable problems become major
effort (Rogers 1962). Social scientists who management crises.
study change and innovation emphasize Meyers (1981:37-39) has argued that
two points: (1) routinization or final accep- much implementation fails because pro-
tance is never certain at the beginning; and gram designs are "counterintuitive"—they
(2) the implementation process always con- just don't make sense. He adds to the litany
tains unknowns that change the ideal so of implementation hurdles the following:
that it looks different when and if it actu- undue haste, compulsion to spend all allot-
ally becomes operational. ted funds by the end of the fiscal year,
personnel turnovers, vague legislation, se- education to be tested in 158 school dis-
vere understaffing, racial tensions, conflicts tricts on 70,000 children throughout the
between different levels of government, nation. The evaluation employed 3,000 peo-
and the divorce of implementation from ple to collect data on program effectiveness.
policy. The evaluation started down the path to
The difference between the ideal, ra- trouble when the designers "simply as-
tional model of program implementation sumed in the evaluation plan that alterna-
and the day-to-day, incrementalist, and tive educational models could and would
conflict-laden realities of program imple- be implemented in some systematic, uni-
mentation is explained without resort to form fashion" (Alkin 1970:2). This as-
jargon in this notice found by Jerome sumption quickly proved fallacious.
Murphy (1976) in the office of a state
education agency: Each sponsor developed a large organiza-
tion, in some instances larger than the entire
federal program staff, to deal with problems
NOTICE
of model implementation. Each local school
The objective of all dedicated system developed a program organization
department employees should consisting of a local director, a team of teach-
be to thoroughly analyze all ers and specialists, and a parent advisory
situations, anticipate all problems group. The more the scale and complexity of
prior to their occurrence,
the program increased, the less plausible it
have answers for these problems,
became for Follow Through administrators
and move swiftly to solve these
problems when called upon. . . . to control the details of program variations,
and the more difficult it became to determine
However . . . whether the array of districts and sponsors
When you are up to your ass in represented "systematic" variations in pro-
alligators, it is difficult to remind gram content. (Williams and Elmore 1976:
yourself that your initial objective 108)
was to drain the swamp, (p. 92)
The Follow Through results revealed
greater variation within models than be-
The Case of Project tween them; that is, the 22 models did not
Follow Through show systematic treatment effects as such.
Most effects were null, some were nega-
Failing to understand that implementa- tive, but "of all our findings, the most
tion of program ideals is neither automatic pervasive, consistent, and suggestive is
nor certain can lead to evaluation disaster, probably this: The effectiveness of each
not only resulting in lack of use, but dis- Follow Through model depended more on
crediting the entire evaluation effort. The local circumstances than on the nature of
national evaluation of Follow Through is a the model" (Anderson 1977:13; emphasis
prime example. Follow Through was intro- in original). In reviewing these findings,
duced as an extension of Head Start for Eugene Tucker (1977) of the U.S. Office
primary-age children. It was a "planned of Education suggested that, in retro-
variation experiment" in compensatory spect, the Follow Through evaluation
education featuring 22 different models of should have begun as a formative effort
with greater focus on implementation which programs are actually operating as

strategies: desired. Conceptualization of ideals "may
arise from any source, but u n d e r the Dis-
It is safe to say that evaluators did not know crepancy Evaluation M o d e l they are de-
what was implemented in the various sites. rived from the values of the p r o g r a m staff
Without knowing what was implemented, it and the client population it serves"
is virtually impossible to select valid effec- (p. 12). Data to compare actual practices
tiveness measures. . . . Hindsight is a mar- with ideals w o u l d come from local field-
velous teacher and in large-scale experimen- w o r k "of the process assessment t y p e " in
tations an expensive one. (pp. 11-12) which evaluators systematically collect
and weigh data descriptive of ongoing
p r o g r a m activity (p. 13).
Ideals and Discrepancies Given the reality that actual implemen-
tation will typically look different from
Provus (1971:27-29) had warned against original ideas, a primary evaluation chal-
the design used in the Follow Through lenge is to help identified decision makers
evaluation at a 1966 conference on educa- determine how far from the ideal the pro-
tional evaluation of national programs: gram can deviate, and in what ways it can
deviate, while still constituting the original
An evaluation that begins with an experi- idea (as opposed to the original ideal). In
mental design denies to program staff what other words, a central evaluation question
it needs most: information that can be used is: H o w different can an actual program be
to make judgments about the program while from its ideal and still be said to have been
it is in its dynamic stages of growth. . . . implemented? The answer must be clari-
Evaluation must provide administrators and fied between primary intended users and
program staff with the information they evaluators as part of the process of specify-
need and the freedom to act on that infor- ing criteria for assessing implementation.
mation. . . .
We will not use the antiseptic assumptions At some point, there should be a determina-
of the research laboratory to compare chil- tion of the degree to which an innovation has
dren receiving new program assistance with been implemented successfully. What should
those not receiving such aid. We recognize the implemented activity be expected to look
that the comparisons have never been pro- like in terms of the underlying decision? For
ductive, nor have they facilitated corrective a complex treatment package put in different
action. The overwhelming number of evalu- local settings, decision makers usually will
ations conducted in this way show no sig- not expect—or more importantly, not want
nificant differences between "experimental" —a precise reproduction of every detail of
and "control" groups, (pp. 11-12) the package. The objective is performance,
not conformance. To enhance the prob-
Instead, Provus (1971) advocated "dis- ability of achieving the basic program or
crepancy evaluation," an approach that policy objectives, implementation should
compares the actual with the ideal and consist of a realistic development of the un-
places heavy emphasis on implementation derlying decision in terms of the local setting.
evaluation. H e argued that evaluations In the ideal situation, those responsible for
should begin by establishing the degree to implementation would take the basic idea
and modify it to meet special local condi- prehensive studies of educational change
tions. There should be a reasonable ever conducted. The study concluded that
resemblance to the basic idea, as measured implementation "dominates the innovative
by inputs and expected outputs, incorporat- process and its outcomes":
ing the best of the decision and the best of
the local ideas. (Williams and Elmore 1976: In short, where implementation was suc-
277-78) cessful, and where significant change in
participant attitudes, skills, and behavior
The implementation of the Oregon occurred, implementation was character-
Community Corrections Act offers an ex- ized by a process of mutual adaptation in
cellent illustration of how local people which project goals and methods were modi-
can adapt a statewide mandate to fit local fied to suit the needs and interests of the local
needs and initiatives. In studying vari- staff and in which the staff changed to meet
ations in implementation of this legisla- the requirements of the project. This finding
tion, Palumbo, Maynard-Moody, and was true even for highly technological and
Wright (1984) found a direct relationship initially well-specified projects; unless adap-
between higher levels of implementation tations were made in the original plans or
and success in attaining goals. Yet, "the technologies, implementation tended to be
implementation factors that lead to more superficial or symbolic, and significant change
successful outcomes are not things that in participants did not occur. (McLaughlin
can easily be transferred from one locale 1976:169)
to another" (p. 72).
The Change Agent Study found that
the usual emphasis in federal programs
Local Variations on the delivery system is inappropriate.
in Implementing McLaughlin (1976) recommended
National Programs
a shift in change agent policies from a pri-
I would not belabor these points if it mary focus on the delivery system to an
were not so painfully clear that implemen- emphasis on the deliverer. An important les-
tation processes have been ignored so fre- son that can be derived from the Change
quently in evaluations. Edwards et al. Agent Study is that unless the developmental
(1975) lamented that "we have frequently needs of the users are addressed, and unless
encountered the idea that a [national] pro- projects are modified to suit the needs of the
gram is a fixed, unchanging object, observ- user and the institutional setting, the promise
able at various times and places" (p. 142). of new technologies is likely to be unfulfilled,
Because this idea seems so firmly lodged in (p. 180; emphasis in original)
so many minds and spawns so many evalu-
ation designs with reduced utility, I feel The emphasis on the "user" in the Rand
compelled to offer one more piece of evi- study brings us back to the importance
dence to the contrary. of the personal factor and attention to
Rand Corporation, under contract to primary intended users in evaluation of
the U.S. Office of Education, studied 293 implementation processes. Formative,
federal programs supporting educational improvement-oriented evaluations can
change—one of the largest and most com- help users make the kinds of program
adaptations to local conditions that Rand program is. If relatively inactive, it is un-
found so effective. That is, evaluation can likely to be very effective.
be a powerful tool for guiding program Effort questions include: Have sufficient
development during implementation; it staff been hired with the proper qualifica-
can facilitate initial judgments about the tions? Are staff-client ratios at desired lev-
connections between program activities els? How many clients with what charac-
and outcomes. But implementation evalu- teristics are being served by the program?
ation, like program innovation, must also Are necessary materials available? An effort
be adaptive and focused on users if the evaluation involves making an inventory of
process and results are to be relevant, program operations.
meaningful, and useful. Utilization-focused Tripodi et al. (1971) have linked effort
criteria for evaluating implementation evaluations to stages of program develop-
must be developed through interaction ment. At initiation of a program, evaluation
with primary intended users. Evaluation questions focus on getting services under
facilitators will have to be active-reactive- way. Later, questions concerning the ap-
adaptive in framing evaluation questions propriateness, quantity, and quality of ser-
in the context of program implementation. vices become more important.
Variations and Options in Monitoring Programs: Routine

Implementation Evaluation Management Information
Monitoring has become an evaluation

In working with intended users to focus
specialization (Grant 1978). An important
evaluation questions, several alternative
way of monitoring implementation over
types of implementation evaluation can be
time is to establish a management informa-
considered, many of which can be used in
tion system (MIS). This provides routine
combination. These options deal with dif-
data on client intake, participation levels,
ferent issues. Over time, a comprehen-
program completion rates, caseloads, client
sive evaluation might include all five types
characteristics, and program costs. The
of implementation evaluation reviewed
hardware and software decisions for an
below.
MIS have long-term repercussions, so the
development of such a routine data col-
lection system must be approached with
Effort Evaluation special attention to questions of use and
problems of managing management infor-
Effort evaluations focus on document- mation systems (Patton 1982b). Estab-
ing "the quantity and quality of activity that lishing and using an MIS are often pri-
takes place. This represents an assessment mary responsibilities of internal evaluators.
of input or energy regardless of output. It This has been an important growth area in
is intended to answer the questions 'What the field of evaluation as demands for ac-
did you do?' and 'How well did you countability have increased in human ser-
do it?' " (Suchman 1967:61). Effort evalu- vices (Attkisson et al. 1978; Broskowski,
ation moves up a step from asking if the Driscoll, and Schulberg 1978; Elpers and
program exists to asking how active the Chapman 1978). The "monitoring and
tailoring" approach of Cooley and Bickel setting or settings under study. This means
(1985) demonstrates how an MIS can be unraveling what is actually happening in a
client oriented and utilization focused. program by searching for the major pat-
Problems in implementing an MIS can terns and important nuances that give the
lead to a MIS-match (Dery 1981). While program its character. A process evaluation
there have been no shortage of docu- requires sensitivity to both qualitative and
mented MIS problems and disasters (Lucas quantitative changes in programs through-
1975), computers and data-based manage- out their development; it means becoming
ment information systems have brought intimately acquainted with the details of
high technology and statistical process con- the program. Process evaluations not only
trol to programs of all kinds (Cranford look at formal activities and anticipated
1995; Posavac 1995; Richter 1995). The outcomes, but also investigate informal
trick is to design them to be useful—and patterns and unanticipated consequences
then actually get them used. Utilization- in the full context of program implementa-
focused evaluators can play an important tion and development.
facilitative role in such efforts. Finally, process evaluations usually in-
clude perceptions of people close to the
program about how things are going. A
Process Evaluation variety of perspectives may be sought from
people inside and outside the program. For
Process evaluation focuses on the inter- example, process data for a classroom can
nal dynamics and actual operations of a be collected from students, teachers, par-
program in an attempt to understand its ents, staff specialists, and administrators.
strengths and weaknesses. Process evalu- These differing perspectives can provide
ations ask: What's happening and why? unique insights into program processes as
How do the parts of the program fit to- experienced and understood by different
gether? How do participants experience people.
and perceive the program? This approach A process evaluation can provide useful
takes its name from an emphasis on looking feedback during the developmental phase
at how a product or outcome is produced of a program as well as later, in providing
rather than looking at the product itself; details for diffusion and dissemination of
that is, it is an analysis of the processes an effective program. One evaluator in our
whereby a program produces the results it utilization of federal health evaluations re-
does. Process evaluation is developmental, ported that process information had been
descriptive, continuous, flexible, and in- particularly useful to federal officials in
ductive (Patton 1980a). expanding a program nationwide. Process
Process evaluations search for explana- data from early pilot efforts were used to
tions of the successes, failures, and changes inform the designs of subsequent centers as
in a program. Under field conditions in the the program expanded.
real world, people and unforeseen circum- Process evaluation is one of the four
stances shape programs and modify initial major components of the CIPP (context,
plans in ways that are rarely trivial. The input, process, product) model of evalu-
process evaluator sets out to understand ation developed by Stufflebeam et al.
and document the day-to-day reality of the (1971; Stufflebeam and Guba 1970). It
involves (1) gathering data to detect or component instead of a program, it is more

predict defects in the procedural design or likely that the component as contrasted to
its implementation during the implementa- entire programs can be generalized to other
tion stages, (2) providing information for sites and other providers. The more homo-
program decision, and (3) establishing a geneous units are, the more likely one can
record of program development as it occurs. generalize from one unit to another. In prin-
ciple, the smaller the unit of analysis within
a hierarchy, the more homogeneous it will
Component Evaluation be. By definition, as programs are composed
of components, programs are more hetero-
The component approach to implemen- geneous than components. It should be
tation involves a formal assessment of dis- easier to generalize from one component to
tinct parts of a program. Programs can be another than to generalize from one pro-
conceptualized as consisting of separate gram to another.
operational efforts that may be the focus of An example of this process might clarify
a self-contained implementation evalu- the point. Any two early childhood programs
ation. For example, the Hazelden Founda- ^may consist of a variety of components im-
tion Chemical Dependency Program typi- plemented in several different ways. Knowl-
cally includes the following components: edge of the success of one program would
detoxification, intake, group treatment, not tell us a great deal about the success of
lectures, individual counseling, release, and the other unless they were structurally simi-
outpatient services. While these compo- lar. However, given the diversity of pro-
nents make up a comprehensive chemical grams, it is unlikely that they would have the
dependency treatment program that can be same type and number of components. In
and is evaluated on the outcome oi contin- contrast, if both had an intake component, it
ued sobriety over time (Laundergan 1 9 8 3 ; would be possible to compare them just on
Patton 1980b), there are important ques- that component. A service provider in one
tions about the operation of any particular part of the state can examine the effective-
component that can be the focus of evalu- ness of a particular component in an other-
ation, either for improvement or to decide wise different program in a different part of
if that component merits continuation. In the state and see its relevance to the program
addition, linkages between one or more he or she was directing, (p. 199)
components may become the focus of
evaluation.
Bickman (1985) has argued that one Treatment Specification
particularly attractive feature of the com-
ponent approach is the potential for greater Treatment specification involves identi-
generalizability of findings and more ap- fying and measuring precisely what it is
propriate cross-program comparisons: about a program that is supposed to have
an effect. It means conceptualizing the pro-
The component approach's major contribu- gram as a carefully defined intervention or
tion to generalizabiiity is its shift from the treatment—or at least finding out if there's
program as the unit of analysis to the com- enough consistency in implementation to
ponent. By reducing the unit of analysis to a permit such a conceptualization. This re-
quires elucidation of the "theory" program takes us into the arena of trying to estab-
staff hold about what they have to do in lish causality.
order to accomplish the results they want.
In technical terms, this means identify- Any new program or project may be thought
ing independent variables that are expected of as representing a theory or hypothesis in
to affect outcomes (the dependent vari- that—to use experimental terminology—the
ables). Treatment specification reveals the decision maker wants to put in place a treat-
causal assumptions undergirding program ment expected to cause certain predicted
activity. effects or outcomes. (Williams and Elmore
Measuring the degree to which concep- 1976:274; emphasis in original)
tualized treatments actually occur can be a
tricky and difficult task laden with meth- From this perspective, one task of imple-
odological and conceptual pitfalls: mentation evaluation is to identify and o p -
erationalize the program treatment.
Social programs are complex undertakings. Some comparative or experimental de-
Social program evaluators look with some- sign evaluations fall into the trap of rely-
thing akin to jealousy at evaluators in ing on the different names programs call
agriculture who evaluate a new strain of themselves—their labels or titles—to dis-
wheat or evaluators in medicine who evalu- tinguish different treatments. Because this
ate the effects of a new drug. . . . The same practice yields data that can easily be mis-
stimulus can be produced again, and other understood and misused, the next section
researchers can study its consequences— explores the problem in greater depth.
under the same or different conditions, with
similar or different subjects, but with some
assurance that they are looking at the effects The Challenge of
of the same thing. Truth-in-Labeling
Social programs are not nearly so specific.
They incorporate a range of components, Warning: This section sermonizes on the
styles, people, and procedures. . . . The con- Pandorian folly attendant upon those w h o
tent of the program, what actually goes on, believe program titles and names. W h a t a
is much harder to describe. There are often program calls its intervention is n o substi-
marked internal variations in operation from tute for gathering actual data on program
day to day and from staff member to staff implementation. Labels are not treatments.
member. When you consider a program as I suspect that overreliance on program
large and amorphous as the poverty program labels is a major source of null findings in
or the model cities program, it takes a major evaluation research. Aggregating results
effort to just describe and analyze the pro- under a label can lead to mixing effective
gram inputs. (Weiss 1972b:43) with ineffective programs that have noth-
ing in c o m m o n except their name. An
Yet, unless basic data are generated evaluation of Residential Community Cor-
about the p r o g r a m as an intervention, the rections Programs in Minnesota offers a
evaluator does not k n o w to what to attrib- case in point. The report, prepared by the
ute the outcomes observed. This is the Evaluation Unit of the Governor's Com-
classic problem of treatment specification mission on Crime Prevention and Control,
in social science research and, of course, compared recidivism rates for three
"types" of programs: (1) halfway houses, houses varied tremendously in treatment

(2) PORT (Probationed Offenders Reha- modality, clientele, and stage of implemen-
bilitation and Training) projects, and (3) tation. The report's comparisons were
juvenile residences. The term halfway based on averages within the three types of
house referred to a "residential facility de- programs, but the averages disguised im-
signed to facilitate the transition of paroled portant variations within each type. No
adult ex-offenders returning to society "average" project existed, yet, the different
from institutional confinement." This dis- programs of like name were combined for
tinguished halfway houses from juvenile comparative purposes. Within types, the
residences, which served only juveniles. Of- report obscured individual sites that were
fenders on probation were the target of the doing excellent work as well as some of
PORT projects (GCCPC 1976:8). What we dubious quality.
have, then, are three different target One has only to read the journals that
groups, not three different treatments. publish evaluation findings to find similar
The report presented aggregated out- studies. There are comparisons between
come data for each type of community "open" schools and "traditional" schools
corrections program, thereby combining that present no data on relative openness.
the results for projects about which they There are comparisons of individual ther-
had no systematic implementation data. In apy with group therapy where no attention
effect, they compared the outcomes of is paid to the homogeneity of either cate-
three labels: halfway houses, PORT pro- gory of treatment.
jects, and juvenile residences. Nowhere in
the several hundred pages of the report was A common administrative fiction, especially
there any systematic data about the activi- in Washington, is that because some money
ties offered in these programs. People went associated with an administrative label (e.g.,
in and people came out; what happened in Head Start) has been spent at several places
between was ignored by the evaluators. and over a period of time, that the entities
The evaluation concluded that "the evi- spending the money are comparable from
dence presented in this report indicates that time to time and from place to place. Such
residential community corrections pro- assumptions can easily lead to evaluation-
grams have had little, if any, impact on the research disasters. (Edwards et al. 1975:142).
recidivism of program clients" (GCCPC
1976:289). These preliminary findings re-
sulted in a moratorium on funding of new Treatment Specification:
residential community corrections, and the An Alternative to Labeling
final report recommended maintaining
that moratorium. With no attention to the A newspaper cartoon showed several
meaningfulness of their analytical labels, federal bureaucrats assembled around a ta-
and with no treatment specifications, the ble in a conference room. The chair of the
evaluators passed judgment on the effec- group was saying, "Of course the welfare
tiveness of an $11 million program. program has a few obvious flaws . . . but if
The aggregated comparisons were es- we can just think of a catchy enough name
sentially meaningless. When I interviewed for it, it just might work!" (Dunagin 1977).
staff in a few of these community correc- Treatment specification means getting
tions projects, it became clear that halfway behind labels to state what is going to hap-
pen in the program that is expected to make group homes? Do certain types of foster
a difference. For example, one theory un- group homes attain better results, both pro-
dergirding community corrections has viding positive experiences for youth and
been that integration of criminal offenders reducing recidivism?
into local communities is the best way to The findings revealed that the environ-
rehabilitate those offenders and thereby ments of the sample of 50 group homes
reduce recidivism. It is therefore important could be placed along a continuum from
to gather data about the degree to which highly supportive and participatory home
each project actually integrates offenders environments to nonsupportive and authori-
into the community. Halfway houses and tarian ones. Homes were about evenly dis-
juvenile residences can be run like small- tributed along the continua of support ver-
scale prisons, completely isolated from the sus nonsupport and participatory versus
environment. Treatment specification tells authoritarian patterns; that is, about half
us what to look for in each project to find the juveniles experienced homes with
out if the program's causal theory is actu- measurably different climates. Juveniles
ally being put to the test. (At this point we from supportive-participatory group homes
are not dealing with the question of how to showed significantly lower recidivism
measure the relevant independent variables rates than juveniles from nonsupportive-
in a program theory, but only attempting to authoritarian ones (r = .33, p < .01). Vari-
specify the intended treatment in nominal ations in type of group-home environment
terms.) were also correlated significantly with
Here's an example of how treatment other outcome variables (Patton, Guthrie,
specification can be useful. A county Com- etal. 1977).
munity Corrections Department in Minne- In terms of treatment specification,
sota wanted to evaluate its foster group- these data demonstrated two things: (1) in
home program for juvenile offenders. The about half of the county's group homes,
primary information users lacked system- juveniles were not experiencing the kind of
atic data about what the county's foster treatment that the program design called
group homes were actually like. The theory for; and (2) outcomes varied directly with
undergirding the program was that juvenile the nature and degree of program imple-
offenders would be more likely to be reha- mentation. Clearly it would make no sense
bilitated if they were placed in warm, sup- to conceptualize these 50 group homes as
portive, and nonauthoritarian environ- a homogeneous treatment. We found
ments where they were valued by others homes that were run like prisons and
and could therefore learn to value them- homes in which juveniles were physically
selves. The goals of the program included abused. We also found homes where young
helping juveniles feel good about them- offenders were loved and treated as mem-
selves and become capable of exercising bers of the family. Aggregating recidivism
independent judgment, thereby reducing data from all 50 homes into a single average
subsequent criminal actions (recidivism). rate would disguise important environ-
The evaluation measured both out- mental variations. By specifying the desired
comes and implementation with special at- treatment and measuring implementation
tention to treatment environment. What compliance, the program's theory could be
kind of treatment is a youth exposed to in examined in terms of both feasibility and
a group home? What are the variations in effectiveness.
EXHIBIT 9.1
Format for Connecting Goals With
Implementation Plans and Measurement
Goals: Expected Indicators: Outcome How Goals Will Be Attained Data on
Client Outcomes Data/Measurement Criteria (Implementation Strategies) Implementation Criteria
1.
2.
3.
4.
(For an in-depth discussion of how to evaluators' research interests are secondary

measure treatment environments for dif- to the information needs of primary in-
ferent kinds of programs—mental health tended information users in utilization-fo-
institutions, prisons, family environments, cused evaluation.
military units, classrooms, businesses, schools,
hospitals, and factories—see Conrad and
Roberts-Gray 1988; Moos 1979, 1975, Connecting Goals
1974.) and Implementation
The process of specifying the desired
treatment environment began with identi- In complex programs with multiple
fied evaluation users, not with a scholarly goals, it can be useful to engage staff in an
literature search. The theory tested was exercise that links activities to outcomes
that held by primary decision makers. and specifies measures for each. Exhibit 9.1
Where resources are adequate and the de- offers a matrix to guide this exercise. Once
sign can be managed, the evaluators may completed, the matrix can be used to focus
prevail upon intended users to include tests the evaluation and decide what informa-
of those theories the evaluators believe are tion would be most useful for program
illuminative. But first priority goes to pro- improvement and decision making.
viding intended users with information
about the degree to which their own imple-
Implementation Overview
mentation ideals and treatment specifi-
cations have actually been realized in pro- This chapter has reviewed five evalu-
gram operations. Causal models are some- ation approaches to implementation: (1)
times forced on program staff when they effort evaluation, (2) ongoing program
bear no similarity to the models on which monitoring, (3) process evaluation, (4)
that staff bases its program activities. The component evaluation, and (5) treatment
specification. Depending on the nature of used generically. It's harmful. We ought to

the issues involved and the information stop talking about evaluation as if it's a single
needed, any one, two, or all five approaches homogenous thing. [DM111:29]
might be employed. The point is that with-
out information about actual program op- Implementation is one possible focus
erations, decision makers are limited in for an evaluation. Not all designs will
interpreting performance data for program include a lot of implementation data.
improvement. These different evaluations Other information may be more impor-
answer different questions and focus on tant, relevant, and useful to inform pend-
different aspects of program implementa- ing decisions. What is crucial is that dur-
tion. The key is to match the type(s) of ing the process of framing the evaluation,
evaluation to the information needs of spe- the issue of implementation analysis is
cific stakeholders and primary intended us- raised. Evaluators have a responsibility in
ers. One of the decision makers we inter- their active-reactive-adaptive interactions
viewed in our utilization study was with stakeholders to explore options with
emphatic on this point: intended users to decide jointly what will
be useful in the particular circumstances
Different types of evaluations are appropri- at hand.
ate and useful at different times.... We tend Sometimes what primary users need and
to talk about evaluation as if it's a single want varies from the evaluator's initial
thing. The word evaluation should not be expectations.
former Ambassador to China Winston Lord was once driving in the Chinese
countryside with his wife. They stopped at an ancient Buddhist temple, where the senior
ntnnk greeted them enthusiastically. "Would you do this temple a great honor and favor
for our future visitors, to guide and instruct them? Would you write something for us
in English?"
Ambassador Lord felt quite flattered because he knew that, traditionally, only
emperors and great poets were invited to write for the temple. The monk n turned ihnrliy
carrying two wooden plaques and said: "To guide and instruct future I nglish visitors,
would you write on this plaque the word 'Ladies' and on this phufue the word
'Gentlemen'?"
May the writings of evaluators be as useful.

( MENU 9.1 ^
Sample Implementation Evaluation Questions
Feasibility and Compliance Issues

1. What was originally proposed and intended for implementation?
2. What needs assessment or situation analysis informed program design?
3. What was the program's expected model?
4. What theory and assumptions undergirded the proposed model, if any?
5. Who has a stake in the program being implemented as proposed and originally
designed?
6. What resources were anticipated for full implementation?
7. What staff competencies and roles were anticipated?
8. What were the original intended time lines for implementation?
9. What aspects of implementation, if any, involve meeting legal mandates?
10. What potential threats to implementation were anticipated during design?
Formative Evaluation Questions

1. What are the program's key characteristics as perceived by various stakeholders,
for example, participants, staff, administrators, funders? How similar or different
are those perceptions? What's the basis of differences?
2. What are the characteristics of program participants and how do those compare
to the intended target population for the program?
3. How do actual resources, staff competencies and experiences, and time lines
compare to what was expected?
4. What's working as expected? What's not working as expected? What challenges
and barriers have emerged? How has staff responded to those challenges and
barriers?
5. What assumptions have proved true? What assumptions are problematic?
6. What do participants actually do in the program? What are their primary activities
(in detail)? What do they experience?
7. What do participants like and dislike? What are their perceptions of what's
working and not working? Do they know what they're supposed to accomplish
as participants? Do they "buy into" the program's goals and intended outcomes?
(continued)
I M l ' N U 9.1 Continued \
8. How well are staff functioning together? What are their perceptions about
what's working and not working? Do they know what outcomes they're aiming
for? Do they "buy into" the program's goals and intended outcomes? What are
their perceptions of participants? of administrators? of their own roles and
effectiveness?
9. What has changed from the original design and why? On what basis are adapta-
tions from the original design being made? Who needs to "approve" such changes?
10. What monitoring system has been established to assess implementation on an
ongoing basis and how is it being used?
Summative Implementation Questions

1. As the program has been implemented, what model has emerged? That is, can the
program be modeled as an intervention or treatment with clear connections
between inputs, activities, and outcomes?
2. To what extent and in what ways was the original implementation design feasible?
What was not feasible? Why? Were deviations from the original design great
enough that what was actually implemented constitutes a different model, treat-
ment, or intervention from what was originally proposed? In other words, has the
feasibility and viability of the original design actually been tested in practice, or
was something else implemented?
3. How stable and standardized has the implementation become both over time and,
if applicable, across different sites?
4. To what extent is the program amenable to implementation elsewhere? What
aspects of implementation were likely situational? What aspects are likely gener-
alizable?
5. What are the start-up and continuing costs of implementation?
6. Has implementation proved sufficiently effective and consistent that the program
merits continuation?
Lessons Learned Implementation Questions

1. What has been learned about implementation of this specific program that might
inform similar efforts elsewhere?
2. What has been learned about implementation in general that would contribute to
scholarly and policy research on implementation?
V )
NOTE: For a larger menu of over 300 implementation evaluation questions, see King et al. 1987:129-41.
The Program's Theory of Action
Conceptualizing Causal Linkages
All the World's a Stage for Theory
In Tony Kushner's Pulitzer Prize-winning play, Angels in America, Part Two opens in the
Hall of Deputies, the Kremlin, where Aleksii Antedilluvianovich Prelapsarianov, the World's
Oldest Living Bolshevik, speaks with sudden, violent passion, grieving a world without
theory:
How arc we to proceed without Theory? What System of Thought have these
Reformers to present to this mad swirling planetary disorganization, to the Inevident
Waller of fact, event, phenomenon, calamity? Do they have, as we did, a beautiful
'theory, as hold, as Grand, as comprehensive a construct. .. ? You can't imagine, when
i<r first read the Classic Texts, when in the dark vexed night of our ignorance and terror
the seed-words sprouted and shoved incomprehension aside, when the incredible bloody
vegetable struggled up and through into Red Blooming gave us Praxis, True Praxis, True
Theory married to Actual Life. . . . You who live in this Sour Little Age cannot imagine
the grandeur of the prospect we gazed upon: like standing atop the highest peak in the
mighty Caucasus, and viewing in one all-knowing glance the mountainous, granite order
of creation. You cannot imagine it. I weep for you.
And what have you to offer now, children of this Theory? What have you to offer in
its place? Market Incentives? American Cheeseburgers? Watered-down Bukharinite
stopgap makeshift Capitalism! NEPmen! Pygmy children of a gigantic race!
Change? Yes, we must change, only show me the Theory, and I will be at the
barricades, show me the book of the next Beautiful Theory, and I promise you these
blind eyes will see again, just to read it, to devour that text. Show me the words that
will reorder the world, or else keep silent.
—Kushner i y y 4 : l . M 4 '
215
Mountaintop Inferences
I hat evil is half-cured whose cause we know.

^ -Shakespeare
Causal inferences flash as lightning bolts in stormy controversies. While philosophers of

science serve as meteorologists for such storms—describing, categorizing, predicting, and
warning, policymakers seek to navigate away from the storms to safe harbors of reasonable-
ness. When studying causality as a graduate student, I marveled at the multitude of
mathematical and logical proofs necessary to demonstrate that the world is a complex place
(e.g., Nagel 1961; Bunge 1959). In lieu of rhetoric on the topic, I offer a simple Sufi story
to introduce this chapter's discussion of the relationship between means and ends, informed
and undergirded by theory.
The incomparable Mulla Nasrudin was visited by a would-be disciple. The man, after
many vicissitudes, arrived at the hut on the mountain where the Mulla (teacher) was
sitting. Knowing that every single action of the illuminated Sufi was meaningful, the
rwucnnici asked Nasrudin why he was blowing on his hands. "To warm myself in the
cold, of course," Nasrudin replied.
Shortly afterward, Nasrudin poured out two bowls of soup, and blew on his own.
"Why are you doing that, Master?" asked the disciple. "To cool it, of course," said the
teacher.
At that point, the disciple left Nasrudin, unable to trust any longer a man who used
the same process to cause different effects—heat and cold.
—Adapted from Shah 1964: 79-80
Reflections on Causality in Evaluation
In some cases, different programs use Stated quite simply, the causal question
divergent processes to arrive at the same in evaluation is this: Did the implemented
outcome; in others, various programs use program lead to the desired outcomes?
similar means to achieve different out- However, in the previous chapters, it has
comes. Sometimes, competing treatments become clear that delineating either pro-
with the same goal operate side by side in gram implementation or outcomes can lead
a single program. Sorting out causal link- us into conceptual and empirical labyrinths
ages challenges both theoretically and unto themselves. Now we must consider
methodologically. how to find openings where they connect
The Program's Theory of Action • 217
to each other. To what extent and in what about causality. Our aim is more modest:
ways do the processes, activities, and treat- reasonable estimations of the likelihood
ments of a program cause or affect the that particular activities have contributed
behaviors, attitudes, skills, knowledge, and in concrete ways to observed effects—
feelings of targeted participants? Such emphasis on the word reasonable. Not de-
questions are complex enough in small, finitive conclusions. Not absolute proof.
local programs, but imagine for a moment Evaluation offers reasonable estimations of
the complexity of attributing effects to probabilities and likelihood, enough to
causes in evaluating an entire multilayered, provide useful guidance in an uncertain
multisite initiative to integrate human ser- world (Blalock 1964). Policymakers and
vices (Knapp 1996:25-26; Marquart and program decision makers, I find, typically
Konrad 1996). understand and appreciate this. Hard-core
One need know little about research to academics and scientists often don't. As
appreciate the elusiveness of definitive, always, the question of primary intended
pound-your-fist-on-the-table conclusions users is . . . primary.
The Theory Option in Evaluation:

Constructing a Means-Ends Hierarchy
o ausation. The relation between mosquitos and mosquito bites.

—Michael Scriven (1991b:77)
To venture into the arena of causality is and therefore must be accomplished before
to undertake the task of theory construc- higher-level goals (long-term impacts). Any
tion. This chapter suggests some simple given objective in the chain is the outcome
conceptual approaches to theory construc- of the successful attainment of the preced-
tion aimed at elucidating and testing the ing objective and, in turn, is a precondition
theory upon which a program is based. A to attainment of the next higher objective.
theory links means and ends. The construc-
tion of a means-ends hierarchy for a pro- Immediate goals refer to the results of the
gram constitutes a comprehensive descrip- specific act with which one is momentarily
tion of the program's model. For example, concerned, such as the formation of an obe-
Suchman (1967) recommended building a sity club; the intermediate goals push ahead
chain of objectives by trichotomizing ob- toward the accomplishment of the specific
jectives into immediate, intermediate, and act, such as the actual reduction in weight of
ultimate goals. The linkages between these club members; the ultimate goal then exam-
levels make up a continuous series of ac- ines the effect of achieving the intermediate
tions wherein immediate objectives (focused goal upon the health status of the members,
on implementation) logically precede in- such as reduction in the incidence of heart
termediate goals (short-term outcomes) disease. (Suchman 1967:51-52)
The means-ends hierarchy for a pro- (Perrow 1968:307). In utilization-focused

gram often has many more than three links. evaluation, the decision about where to
In Chapter 7, I presented the mission enter the means-ends hierarchy for a par-
statement, goals, and objectives of the ticular evaluation is made on the basis of
Minnesota Comprehensive Epilepsy Pro- what information would be most useful to
gram. This three-tier division—mission, the primary intended evaluation users. In
goals, and objectives—was useful to get an other words, a formative evaluation might
overview of the program as an initial step focus on the connection between inputs
in identifying what evaluation informa- and activities (an implementation evalu-
tion might be most useful. Once that in- ation) and not devote resources to measur-
itial focus was determined, a more de- ing outcomes higher up in the hierarchy
tailed, multitiered chain of objectives until implementation was ensured. Eluci-
could be constructed. For example, the dating the entire hierarchy does not incur
epilepsy program had educational, re- an obligation to evaluate every linkage in
search, treatment, and administrative the hierarchy. The means-ends hierarchy
goals. Once the research goal was selected displays a series of choices for more fo-
by decision makers as the evaluation pri- cused evaluations while also establishing a
ority, a more thorough means-ends hier- context for such narrow efforts.
archy was constructed. Exhibit 10.1 illus- Suchman (1967:55) used the example of
trates the difference between the initial a health education campaign to show how
three-tier conceptualization and the more a means-ends hierarchy can be stated in
refined multitier chain of objectives devel- terms of a series of measures or evaluation
oped later. To have constructed such a findings. Rather than linking a series of
detailed, multitier chain of objectives for objectives, in Exhibit 10.2, he displayed the
all seven epilepsy goals would have taken theoretical hierarchy as a series of evalu-
a great deal of time and effort. By using ative measurements.
the simple, three-tier approach initially, it How theory-driven an evaluation should
was possible to then focus on those goal be is a matter of debate, as is the question
areas in which conceptualizing a full chain of what sources to draw on in theory con-
of objectives (or means-ends hierarchy) struction (Bickman 1990; Chen and Rossi
was worth the time and effort. 1989). Evaluators who gather purely de-
The full chain of objectives that links scriptive data about implementation or
inputs to activities, activities to immediate outcomes without connecting the two in
outputs, immediate outputs to intermedi- some framework risk being attacked as
ate outcomes, and intermediate outcomes atheoretical technicians. Yet, a program
to ultimate goals constitutes a program's must have achieved a certain level of ma-
theory. Any particular paired linkage in the turity to make the added effort involved in
theory displays an action and reaction: a theory-driven evaluation fruitful. At times,
hypothesized cause and effect. As one con- all decision makers need and want is de-
structs a hierarchical/sequential model, it scriptive data for monitoring, fine-tuning,
becomes clear that there is only a relative or improving program operations. How-
distinction between ends and means: "Any ever, attention to program theory can yield
end or goal can be seen as a means to important insights and, in recent years,
another goal, [and] one is free to enter the thanks especially to Chen's (1990) advo-
'hierarchy of means and ends' at any point" cacy of theory-driven evaluation and the
EXHIBIT 10.1
Initial and Refined Epilepsy Program Means-Ends Theory
Initial Conceptualization of Epilepsy Program
Program Mission: To improve the lives of people with epilepsy through research
Program Goal: To publish high-quality, scholarly research on epilepsy
Program Objective: To conduct research on neurological, pharmacological, epidemiological,
and social psychological aspects of epilepsy
Refined Conceptualization of Epilepsy Chain of Objectives
1. People with epilepsy lead healthy, productive lives

2. Provide better medical treatment for people with epilepsy
3, Increase physicians' knowledge of better medical treatment for epileptics
4. Disseminate findings to medical practitioners
5. Publish findings in scholarly journals
6. Produce high-quality research findings on epilepsy
7. Establish a program of high-quality research on epilepsy
8. Assemble necessary resources (personnel, finances, facilities) to establish a research
program
Identify and generate research designs to close knowledge gaps
10. Identify major gaps in knowledge concerning causes and treatment of epilepsy
w o r k of Connell et al. (1995) on using the- 2. The inductive approach—doing fieldwork

ories of change to frame evaluations of on a program to generate grounded theory
community initiatives, evaluators have 3. The user-focused approach—working with
been challenged to take a more active role intended users to extract and specify their
in looking for opportunities to design implicit theory of action
evaluations on a solid foundation of theory.
The deductive approach draws on
dominant theoretical traditions in specific
Three Approaches t o scholarly disciplines to construct models
Program T h e o r y of the relationship between p r o g r a m
treatments and outcomes. For example,
Three major approaches to program an evaluation of whether a graduate
theory development for evaluation are: school teaches students to think critically
could be based on the theoretical perspec-
1. The deductive approach—drawing on schol- tive of a phenomenography of adult criti-
arly theories from the academic literature cal reflection, as articulated by the Distin-
EXHIBIT 10.2
Theoretical Hierarchy of Evaluation Measures
for a Health Education Campaign
Reduction in morbidity and mortality
Proportion of people in the target population who meet

prescribed standards of behavior
*
Number whose behaviors change
Number whose opinions change

*
Number who learn the facts
Number who read it
Number of people who receive the literature
Amount of literature distributed
Number of pieces of literature available for distribution

*
Pretest literature by readability criteria
SOURCE: Adapted from Suchman 1967:55.
guished Professor of Education Stephen than evaluation, that is, to let the litera-
Brookfield (1994), an approach that em- ture review and theory testing take over
phasizes the visceral and emotional di- the evaluation. Testing social science
mensions of critical thought as opposed to theories may be a by-product of an evalu-
purely intellectual, cognitive, and skills ation in which the primary purpose is
emphases. Illustrations of the deductive knowledge generation (see Chapter 4),
approach to evaluation are chronicled in but the primary focus in this chapter is on
Rossi and Freeman (1993) and Boruch, testing practitioner theories about why
McSweeny, and Soderstrom (1978). How- they do what they do and what they think
ever, the temptation in the deductive ap- results from what they do. Utilization-
proach is to make the study more research focused evaluation involves primary in-
tended users in specifying the program's fieldwork rather than from discussion and
theory and in deciding how much atten- group facilitation with those involved.
tion to give to testing the theory gener- What makes the user-focused approach
ated, including how much to draw on challenging is that practitioners are seldom
social science theory as a framework for aware of their theory of action. The notion
the evaluation (Patton 1989). that people in programs operate on the
The inductive approach involves the basis of theories of action derives from the
evaluator in doing fieldwork to generate work of organizational development schol-
theory. Staying with the example of evalu- ars Chris Argyris and Donald Schdn (1978,
ating whether graduate students learn to 1974). They studied the connection be-
think critically, the inductive approach tween theory and practice as a means of
would involve assessing student work, ob- increasing professional effectiveness:
serving students in class, and interviewing
students and professors to determine what
model of education undergirds efforts to We begin with the proposition that people
impart critical thinking skills. Such an ef- hold theories of action about how to produce
fort could be done as a study unto itself, for consequences they intend. Such theories are
example, as part of an early evaluability theories about human effectiveness. By effec-
assessment process, or it could be done in tiveness we mean the degree to which people
conjunction with a deductive effort based produce their intended consequences in
on a literature review. The product of the ways that make it likely that they will con-
inductive approach, and therefore a major tinue to produce intended consequences.
product of the evaluation, would be an Theories of action, therefore, are theories
empirically derived theoretical model of about effectiveness, and because they con-
the relationship between program activities tain propositions that are falsifiable, they are
and outcomes framed in terms of important also theories about truth. Truth in this case
contextual factors. means truth about how to behave effectively.
(Argyris 1982:83)
The phrase theories of action refers spe-

User-Focused Theory cifically to how to produce desired results
of Action Approach in contrast to theories in general, which
explain why some phenomenon of inter-
In the user-focused approach, the evalu- est occurs. Deductive and inductive ap-
ator's task is to facilitate intended users, proaches to theory make use of programs
including program personnel, in articulat- as manifestations of some larger phe-
ing their operating theory. Continuing with nomenon of interest while theories of ac-
the critical thinking example, this would tion are quite specific to a particular pro-
mean bringing together students and program or organization. Argyris and Schon
fessors to make explicit their educational (1978) distinguish two kinds of theories
assumptions and generate a model that of action: (1) espoused theories—what
could then be tested as part of the evalu- people say or believe is their theory; and
ation. In the purely inductive approach (2) theories-in-use—the bases on which
above, by way of contrast, the evaluator people actually act. They drew on a great
builds the theory from observations and body of research showing the following:
People do not always behave congruently and treatment specifications actually

with their beliefs, values, and attitudes (all achieve desired o u t c o m e s t h r o u g h p r o -
part of espoused theories). . . . Although gram operations. The evaluator's o w n
people do not behave congruently with their theories and academic traditions can be
espoused theories, they do behave con- helpful in discovering and clarifying the
gruently with their theories-in-use, and they p r o g r a m ' s theories of action, but testing
are unaware of this fact. (Argyris 1982:85) intended users' and decision makers'
theories of p r o g r a m m a t i c action is pri-
In this conundrum of dissonance between mary; the evaluator's scholarly interests
belief and practice lies a golden oppor- are secondary.
tunity for reality testing: the heart of The importance of understanding the
evaluation. program's theory of action as perceived by
The user-focused theory of action ap- key stakeholders is explained in part by
proach can involve quite a bit of work, basic insights from the sociology of knowl-
since few front-line practitioners in pro- edge and work on the social construction
grams are schooled to think systematically of reality (Holzner and M a r x 1979; Berger
in terms of theoretical constructs and rela- and Luckman 1967; Schutz 1967). This
tionships. Moreover, the idea of making work is built on the observation of W I.
their assumptions explicit and then testing Thomas that what is perceived as real is real
them can be frightening. The user-focused in its consequences. In this case, espoused
evaluator, as facilitator of this process, theories are what practitioners perceive to
must do at least five things: be real. Those espoused theories, often im-
plicit and only espoused when asked for,
1. Make the process of theory articulation have real consequences for what practition-
understandable. ers do. Elucidating the theory of action
2. Help participants be comfortable with the held by primary users can help t h e m be
process intellectually and emotionally. more deliberative about what they d o and
3. Provide direction for how to articulate es- more willing to put their beliefs and as-
poused theories that participants believe un- sumptions to an empirical test through
dergird their actions. evaluation. In short, the user-focused ap-
4. Facilitate a commitment to test espoused the- proach challenges decision makers, pro-
ories in the awareness that actual theories- gram staff, funders, and other users to en-
in-use, as they emerge, may be substantially gage in reality testing, that is, to test
different from espoused theories. whether what they believe to be true (their
5. Keep the focus on doing all this to make the espoused theory of action) is what actually
evaluation useful. occurs (theory-in-use).
T h e causal model t o be tested in the

user-focused evaluation is the causal A Reality-Testing Example
model u p o n which program activities are
based, not a model extracted from aca- Let me offer a simple example of user-
demic sources or fieldwork. First priority focused, theory-of-action reality testing. A
goes to providing primary stakeholders State Department of Energy allocated con-
with information about the degree to servation funds through 10 regional dis-
which their own implementation ideals tricts. An evaluation was commissioned by
the department to assess the impact of local that the reality (theory-in-use) in four dis-
involvement in priority setting. State and tricts did not match the espoused theory
regional officials articulated the following in ways that had significant consequences
equitable and uniform model of decision for all concerned.
making as their espoused theory of action: This is a simple, commonsense example
of a user-focused approach to articulating
1. State officials establish funding targets for and testing a program's theory of action.
each district based on needs assessments and Nothing elegant. No academic trappings.
available funds. The espoused theory of action is a straight-
2. District advisory groups develop proposals forward articulation of what is supposed to
to meet the state targets with broad citizen happen in the process that is intended to
input. achieve desired outcomes. The linkages be-
3. The State approves the budgets based on the tween processes and outcomes are made
merit of the proposals within the guidelines, explicit. Evaluative data then reveal the
rules, and targets provided. theory-in-use, that is, what actually hap-
4. Expected result: Approved funds equal pens. Program staff, other intended users,
original targets. and evaluators can learn a great deal from
engaging in this collaborative process (e.g.,
In short, the espoused theory of action Layzer 1996).
was that decisions are made equitably
based on explicit procedures, guidelines,
and rules. The data showed this to be the A Menu of Theory-
case in 6 of the 10 districts. In the other 4 Based Approaches
districts, however, proposals from the dis-
tricts exceeded the assigned target Each of the three approaches to pro-
amounts by 30% to 55%; that is, a district gram theory—deductive, inductive, and
assigned a target of $100 million submit- user-focused—has advantages and disad-
ted proposals for $140 million (despite a vantages. These are reviewed in Menu
"rule" that said proposals could not ex- 10.1, drawing on the work of Lipsey
ceed targets). Moreover, the final, ap- and Pollard (1989) and Chen (1989). The
proved budgets exceeded the original tar- strategic calculations a utilization-focused
gets by 20% to 40%. The district with a evaluator must make include determining
target of $100 million and proposals for how useful it will be to spend time and
$140 million received $120 million. Four effort elucidating a theory of action (or
of the districts, then, were not engaged in more than one where different perspectives
a by-the-book equitable process; rather, exist); how to keep theory generation from
their process was negotiated, personal, becoming esoteric and overly academic;
and political. Needless to say, when these how formal to be in the process; and what
data were presented, the six districts that combinations of the three approaches, or
followed the guidelines and played the relative emphasis, should be attempted.
funding game by what they thought were Factors to consider in making these calcu-
uniform rules—the districts whose pro- lations will be clearer after some more ex-
posals equaled their assigned targets— amples, which follow, but the focus here
were outraged. Testing the espoused the- is on the user-focused, theory-of-action
ory of uniformity and fairness revealed approach.
u
JJ -o
b0_ o
EjH bo ui
a bo
u a c 2 £ tl
Judi
E d
cd ^ u O -3 ^. o o -3
o,S ti to uT <J « t> Ui
d ; <*3 " S£ u, oS
a, o -°
u O JO U
d ti C Mg
o v o c E o o jd -a " d . 5 bO
.y
ca
u, t>
« j* j y Jd
bO oi E on d
c
« a o « •S d CB
Ui
t) «
^ S '5 3 ^ u
•S p O S | bC Z.S
3 t! S u " d -3 o
c « « a, « « u
Ui
o bb « .2 3 a. _u
u -g o 3
-| S g g JStS y 2^
u, " o
Jd
ui 'G SI
u a. u
O v % bo Zlti.il j-' _d Jj - a d
o
-- S £ .9 "d o ij *c-£3 o d J d ^ u d Jd <-2
o bO O bO u
g 8 « £T ° U 2 O - t)
Q o Q 'a, Q jd 3
o « > Q-SJS Q g-S
3
not
d O d
Ul i-*-l a ui o
c< t>
•- 6 J iiu
-s-v
irtici
ty be
6 fZ Ui
ties
S bo
"C ™O ^ . T133
Q 'V
" nj "O ud 3 a,
2 E 0 3 e-S o
EI 5 S
ti >-
a • "S -E
theory, the
ers struggl
refl t program
al explicit
^ c te 2 tl
2 .2 •3 r1 Js I? u
.S S. J3
-S &
3 •- E u
OJ2
'C a , °i
"> 2 u tl u, J»j O v G d g
£ 1 . - t)
i-3 a ,
> , bfl
« o
to V) -a
d
o o 2 o bO
rstand
a
jd
"S g& 3 -a
"« _3 u
the
tl
I 1
3 — 13
C3 -3 > 13 .2
5 -a
d
d
-a o 3 C
&> 3 £
o t> tl u U
d d -a JD rt t£
R
Ui o Ui o d SE-S 5 t d
"<*>
a
U *3
u
U
00
Ul
U 3 d
« vb d
O 13 E ~
o S-T3 M
3 nj 3 n]
5
/s
-o
ti O
i n
-o
,> T3 >,
>< "- rt - 3 Wi
2ca C
o
R a
V o
d Ui
tl O Ui "Si u > d u
^
nj Ul
-G
bO
*J
•fc* Ul OJ .H
d jd d -d tl
^ Q
Jd
H l-H jd
H
• >H
X
Ui
bO rt
4_| 1-3
ts
UJ a
bo
O 2 'L "a,
x ti ti
Ui
bo bO > .
d t> .ti O O
ui b 8§
1"8J J*SE ^
o d, u jd
ui
2
0
a , <y <-& o v •S d - o S^.a

w o CQ . G 4J
« ° Ji 2 2 E
" g - S &*•
JJH S >5 c
ti
Jd 3 s> d
2" ^ ° g o «
M Ji u,
o 2
iS c « u 2 M u d -5
L <
Cu
Q.
I
i-2
! E
« 2 o
Q-Ovt!
C O u
i-3 -O bo
224
Getting at Assumptions program staff and decision makers empiri-

and Causal Connections cally test their causal hypotheses than by
telling them such causal hypotheses are
Identifying Critical nonsense. Not only does the wheel have to
Validity Assumptions be re-created from time to time, its efficacy
has to be restudied and reevaluated to dem-
The purpose of thoroughly delineating onstrate its usefulness. Likewise, the evalu-
a program's theory of action is to assist ator's certain belief that square wheels are
practitioners in making explicit their as- less efficacious than round ones may have
sumptions about the linkages between in- little impact on those who believe that
puts, activities, immediate outputs, inter- square wheels are effective. The utilization-
mediate outcomes, and ultimate goals. focused evaluator's task is to delineate the
Suchman (1967) called beliefs about cause- belief in the square wheel and then assist
effect relationships the program's validity the believers in designing an evaluation that
assumptions. For example, many education will permit them to test for themselves their
programs are built on the validity assump- own perceptions and hypotheses.
tions that (1) new information leads to This does not mean that the evaluator is
attitude change and (2) attitude change passive. In the active-reactive-adaptive
affects behavior. These assumptions are process of negotiating the evaluation's fo-
testable. Does new knowledge change atti- cus and design, the evaluation facilitator
tudes? Do changed attitudes lead to can suggest alternative assumptions and
changed behaviors? theories to test, but first priority goes to
As validity assumptions are articulated evaluation of validity assumptions held by
in a means-ends hierarchy, the evaluator primary intended users.
can work with intended users to focus the
evaluation on those critical linkages where Filling in the Conceptual Gaps
information is most needed at that particu-
lar point in the life of the program. It is Helping stakeholders identify concep-
seldom possible or useful to test all the tual gaps in their theory of action is another
validity assumptions or evaluate all the task for the user-focused evaluation facili-
means-ends linkages in a program's theory tator. The difference between identifying
of action. The question is one of how freely validity assumptions and filling in concep-
such validity assumptions are made and tual gaps can be illustrated as follows.
how much is at stake in testing the validity Rutman (1977) has argued that the idea of
of critical assumptions (Suchman 1967:43). using prison guards as counselors to in-
In a utilization-focused evaluation, the mates ought never have been evaluated
evaluator works with the primary intended (Ward, Kassebaum, and Wilner 1971) be-
users to identify the critical validity as- cause, on the face of it, the idea is nonsense.
sumptions where reduction of uncertainty Why would anyone ever believe that such
about causal linkages could make the most a program could work? But clearly,
difference. whether they should have or not, many
The evaluator's beliefs about the validity people did believe that the program would
of assumptions is less important than what work. The evaluator's task is to fill in the
staff and decision makers believe. An evalu- conceptual gaps in this theory of action so
ator can have greater impact by helping that critical evaluative information needs
can be identified. For example, are there delineate crucial intervening objectives.
initial selection processes and training pro- T h e ultimate goal was cleaner air; the
grams for guards? Are guards supposed to target of the legislation was a handful of
be changed during such training? T h e first engines that each auto manufacturer
critical evaluation issue may be whether tested before going to mass production.
prison guards can be trained to exhibit Authorization for mass production was
desired counselor attitudes and behaviors. given if these prototypes operated u n d e r
Whether prison guards can learn and prac- carefully controlled conditions for 50,000
tice h u m a n relations skills can be evaluated miles. Cars that failed pollution tests as
without ever implementing a full-blown they left the assembly line were not with-
program. held from dealers. Cars on the r o a d were
Filling in the gaps in the program's the- not inspected to make sure that pollution
ory of action goes to the heart of the imple- control equipment was still in place and
mentation question. W h a t series of activi- functioning properly. Prototypes were
ties must take place before there is reason tested for 5 0 , 0 0 0 miles, but most cars are
even to hope that impact will result? If eventually used for 100,000 miles, with
activities and objectives lower in the pollution in older cars being much worse
means-ends hierarchy will not or cannot be than that in new ones. In short, there are
implemented, then evaluation of ultimate many intervening steps between testing
outcomes is problematic. prototype automobiles for pollution con-
trol compliance and improving air qual-
ity. As Bruce Ackerman (1977) predicted,
There are only two ways one can move up
the scale of objectives in an evaluation:
(a) by proving the intervening assumptions Over a period of time, the manufacturers will
through research, that is, changing an as- build cleaner and cleaner prototypes. Bil-
sumption to a fact, or (b) by assuming their lions of dollars will be spent on the assembly
validity without full research proof. When line to build devices that look like these
the former is possible, we can then interpret prototypes. But until Congress, the EPA, and
our success in meeting a lower-level objective the states require regular inspections of all
as automatic progress toward a higher cars on the road, very little will come of all
one. . . . this glittering machinery.
When an assumption cannot be Indeed, we could save billions if we con-
proved . . . we go forward at our peril. To a tented ourselves with dirtier prototypes, but
great extent, the ultimate worth of evalu- insisted on cleaner cars. . . . Congressmen
ation for public service programs will de- themselves woefully exaggerate the impor-
pend upon research proof of the validity of tance of their votes for cleaner prototypes.
assumptions involved in the establishment of They simply have no idea of the distance
key objectives. (Suchman 1967:57) between prototype and reality. They some-
how imagine that the hard job is technologi-
T h e N a t i o n a l Clean Air Act and its cal innovation and that the easy job is human
a m e n d m e n t s in the mid-1970s provide a implementation, (p. 4)
good example of legislation in which pol-
icy and planning activity focused on initial Delineating an espoused theory of ac-
objectives and ultimate goals but failed to tion involves identifying critical assump-
tions, conceptual gaps, and information Targets of opportunity are those evalu-
gaps. The conceptual gaps are filled by ation questions about which primary in-
logic, discussion, and policy analysis. The formation users care the most and most
information gaps are filled by evaluation need evaluative information for decision
research. making. Having information about and
answers to those select questions can
make a difference in what is done in the
program. An example from an evaluation
Using the Theory of Action of the New School of Behavioral Studies
to Focus the Evaluation: in Education, University of North Dakota,
The New School Case illustrates this.
The New School of Behavioral Studies
Once an espoused theory of action is in Education was established as a result of
delineated, the issue of evaluation focus a statewide study of education conducted
remains. This involves more than mechani- between 1965 and 1967. The New School
cally evaluating lower-order validity as- was to provide leadership in educational
sumptions and then moving up the hierar- innovations with an emphasis on individu-
chy. Not all linkages in the hierarchy are alized instruction, better teacher-pupil re-
amenable to testing; different validity as- lationships, an interdisciplinary approach,
sumptions require different resources for and better use of a wide range of learning
evaluation; data-gathering strategies vary resources (Statewide Study 1967:11-15).
for different objectives. In a summative In 1970, the New School had gained na-
evaluation, the focus will be on outcomes tional recognition when Charles Silberman
attainment and causal attribution. For for- described the North Dakota Experiment as
mative evaluation, the most important fac- a program that was resolving the "crisis in
tor is determining what information would the classroom" in favor of open education.
be most useful at a particular point in time. The New School established a master's
This means selecting what Murphy (1976) degree, teaching-intern program in which
calls targets of opportunity in which addi- interns replaced teachers without degrees
tional information could make a difference so that the latter could return to the univer-
to the direction of incremental, problem- sity to complete their baccalaureates. The
oriented, program decision making: cooperating school districts released those
teachers without degrees who volunteered
to return to college and accepted the mas-
In selecting problems for analysis, targets of ter's degree interns in their place. Over
opportunity need to be identified, with po- four years, the New School placed 293
litical considerations specifically built into interns in 48 school districts and 75 ele-
final choices. Planning activity in a certain mentary schools, both public and paro-
area might be opportune because of expiring chial. The school districts that cooperated
legislation, a hot political issue, a breakdown with the New School in the intern program
in standard operation procedures, or new contained nearly one third of the state's
research findings. At any time, certain poli- elementary school children.
cies are more susceptible to change than The Dean of the New School formed a
others, (p. 98) task force of teachers, professors, students,
parents, and administrators to evaluate the wanted and needed. Indeed, for a variety
program. In working with that task force, of personal, political, and scholarly rea-
I constructed the theory of action shown in sons, these issues made quite good evalu-
Exhibit 10.3. The objectives stated in the ation targets of opportunity. The evalu-
first column are a far cry from being clear, ation therefore focused on three questions:
specific, and measurable, but they were (1) To what extent are summer trainees
quite adequate for discussions aimed at conducting open classrooms during the
focusing the evaluation question. The sec- regular year? (2) What factors are related
ond column lists validity assumptions un- to variations in openness? (3) What is the
derlying each linkage in the theory of ac- relationship between variations in class-
tion. The third column shows the measures room openness and parent/administrator
that could be used to evaluate objectives at reactions to intern classrooms?
any level in the hierarchy. Ultimate objec- At the onset, nothing precluded evalu-
tives are not inherently more difficult to ation at any of the seven levels in the hier-
operationalize. Operationalization and mea- archy of objectives. There was serious dis-
surement are separate issues to be deter- cussion of all levels and alternative foci. In
mined after the focus of the evaluation has terms of the educational literature, the is-
been decided. sue of the outcomes of open education
When the Evaluation Task Force dis- could be considered most important; in
cussed Exhibit 10.3, members decided they terms of university operations, the summer
already had sufficient contact with the sum- program would have been the appropriate
mer program to assess the degree to which focus; but in terms of the information
immediate objectives were being met. They needs of the primary decision makers and
also felt they had sufficient experience to primary intended users on the task force,
be comfortable with the validity assump- evaluation of the intermediate objectives
tion linking objectives six and seven. With had the highest potential for generating
regard to the ultimate objectives, the task useful, formative information.
force members said that they needed no In order to obtain the resources neces-
further data at that time in order to docu- sary to conduct this evaluation, Vito Per-
ment the outcomes of open education (ob- rone, dean of the New School, had to make
jectives one and two), nor could they do unusual demands on the U.S. Office of
much with information about the growth Education (OE). The outcomes of the New
of the open education movement (objective School teaching program were supposed to
three). However, a number of critical un- be evaluated as part of a national OE study.
certainties surfaced at the level of interme- Perrone argued that the national study, as
diate objectives. Once students left the designed, would be useless to the New
summer program for the one-year intern- School. He talked the OE people into al-
ships, program staff were unable to care- lowing him to spend the New School's
fully and regularly monitor intern class- portion of the national evaluation money
rooms. They didn't know what variations on a study designed and conducted locally.
existed in the openness of the classrooms, The subsequent evaluation was entirely the
nor did they have reliable information creation of the local task force described
about how local parents and administrators above, and it produced instruments and
were reacting to intern classrooms. These data that became an integral part of the
were issues about which information was North Dakota program (see Pederson 1977).
The national study produced large volumes the evaluation. This means moving beyond
of numbers (with blanks entered on the discussing the theory of action to gathering
lines for North Dakota) and, as far as I can data on it. Such was the case in an evalu-
tell, was of no particular use to anyone. ation of a multifaceted home nursing pro-
gram for the elderly. Facilitating articula-
tion of the program's theory of action
Developing a Theory helped staff sort out which of the many
of Action as Process Use things they did were really central to the
outcomes they wanted. As a member of an
Thus far, this discussion of theory of evaluation task force for farming systems
action has been aimed at demonstrating the research, I worked with colleagues to iden-
value of this conceptual strategy as a way tify the critical elements of "a farming sys-
of focusing evaluation questions and iden- tems approach" and place those elements
tifying the information needs of primary in a hierarchy that constituted a develop-
stakeholders. At times, helping program mental theory of action. In these and many
staff or decision makers to articulate their other cases, my primary contributions were
programmatic theory of action is an end in program design and conceptualization
itself. Evaluators are called on, not only to skills that combined stakeholder discus-
gather data, but also to assist in program sions with observations of the program to
design. Knowing how to turn a vague dis- develop a theory of action. Once devel-
cussion of the presumed linkages between oped, the theory of action served to focus
program activities and expected outcomes future program development efforts as well
into a formal written theory of action can as evaluation questions.
be an important service to a program. This
is an example of using the evaluation pro- Targets of Opportunity Over
cess to improve a program, as discussed in Time: Theory as a Road Map
Chapter 5.
Evaluation use takes many forms, not Evaluation can make an ongoing contri-
only use of findings. The work of Palumbo, bution to program improvement as pro-
Musheno, and Maynard-Moody (1985) on gram staff and other primary stakeholders
the Community Corrections Act of Oregon learn to use evaluation concepts to shape
nicely illustrates the use of evaluation to and test program ideas. This ongoing, de-
(1) conceptualize a major piece of statewide velopmental role for evaluation is particu-
legislation from vague policies into a for- larly important for internal evaluators to
mal implementation-outcomes hierarchy; cultivate. The theory of action can be a
(2) design practical, programmatic link- road map to plan different evaluation ef-
ages; and (3) construct a viable, streetwise forts over time.
theory of action. Unlike most external evaluators, who
On many occasions, then, the evaluation encounter a program at a particular point
data collection effort may include discov- in time, make their contribution, and leave,
ering and formalizing a program's theory perhaps never to have contact with the
of action. In such cases, rather than being program again, internal evaluators are
a means of facilitating the process of focus- there for the long haul. They need to be
ing evaluation questions, the theory of ac- particularly sensitive to how evaluation can
tion can be the primary focus of analysis in serve different needs over time, including
I b
Q-
E <_»
CD
O ~ X I
num
sove
erof
-O CD CD
CO E "co x :
k_ e>
a>
+*
o LU
CD CO
"3 Q CO CD
CO
COx:
3
"o *o
CO CO CD
>
Ul .•3 3
Q O
c
CD
x• C:
«^^
TJ o
C
CO
• • „
c co
o o
"o o>
ce
K CD
CO
<s - C E
- o Z3 CO _£
o >, c Q_
- 5 .2 CD
CD
CO
X
CD -*--
<o
CO •a
C CO
CD *"• CZ
CD CO CO *o
mce
•JOA
CD
3 ca _> E £
ia CD
CO o co
O co CD
-i- -£= "*

^
•5 CL
ui S 3 o E ?
a> — CD
3= E o •— £Z a.
Z «
IUJ
S
ier,
CO
2 "5
a>
• f co t j _
o
CO
C0
o
CO
T
S
3
T3
CO
"O
CD
o
I— o> .- •= c
c
fullei
CO c
_> 11 tn o
u
CD o
o
o >
Ai.A ^ £ "a
CZ c= —
fe <n
«A
eds of im
lent a
nite dSta
re op
nited
ca
I
satisfy
a> E E =
x:
bli!
CD
C ^ "o S
-S CD
5 .2 <•> E
y "co - Q co
_> *-
h Dakota i
arech ildr ent live full, ri
legi mize the e

arger num
rthl akota and
and cognit
f
t o Q —
CO
trO
1 §. o z
leet the a:ffec
O cn
classiroomi
acilitatea
maintenan
hild ren in
s.
CL
£l CO O Li-
s
230
IM
to CO
Dl CD >
to C t= >
CO
5
ZJ
in o ° ?i
=TZI ro
o to o o cz o
tD
to E o tu .2 o_
<D
T 3 TZ>
. co ff
co
Z3 CZ "O CD
CO "a"
tit
son ize
to =2 3
CO CD
ns cz Z to
o CO o £ cu 5
ati
2 tu
O
•s
tu
to CO ^
istr
Z3
£ CD CD
_cz tu
-o a> 3= to" "^
_o CL.
Ic TO
g- <° os
CZ
to
"CJ
CD CD
CD N
ro "TO
— o tu ZJ
TO TO
c £ o
CD
^_
E
CO
CO
^
o to to
11 -1= - o to
co E E
tu o o
ZJ o o CO o
co co CO
to t o CO
tu JO JO
a £ d
E —
E en E
ZJ cz co
CO zi
TD
"o
O CO "to tu
.cz E
o o
CO o EH
5 CO
E
CD CO Z3
"Z tfl to
m o CD .2 co
•£= CO
"O
CD
E
o
z. > O
CZ
CD
o
CO
CO 53 CO
CD CO
§- SCO
k_
*-
2
o
CD
-CZ
XI o
CO ca
CO t CO
cz 3 £
to CZ
o "to
E c? "to
CO '5 cz
prog
o "E
"o to
CO ro
/ V co 1 / X^ to / \
Q .2 en
-cz "cz
t
o
to f5 cz
£> "o3
•5 &
Z TO > . g TZ3
cz oCZ
cz ca
IBJ:
CD
o CO „
CO -CZ
o
to o
E •% 5o
II!
>
•Si
TO
CZ
E
CD
sjec
tsa
-a to TO
CZ •E
ear
teache
to
cz TO
u CD
rcl:
ona lize
£ to
.2 CO
tu E CD <v E
to CL S2 o CD
CO o o
T3
CD
CD
^ca tu
TO
CD o CD
TO
O
E >> CO
:las:
CZ
">
'rov
to
_£Z CD '>
o CO
3 ~
CL '§
CZ
CL
CD
Q. cz
r—
_: CD O
231
both program design and accountability "Put simply, the basic dilemma faced in all
functions. In this way internal evaluators sciences is that of how much to over-
help build an institutional memory for a simplify reality" (Blalock 1964:8). The
program or organization, a memory made challenge is to construct simplifications
up of lessons learned, ideas cultivated, and that pass the dual tests of usefulness and
skills developed over time. This means that accuracy.
internal evaluators need to understand and
take into consideration the "social learn-
ing" (Stone 1985) that comes from evalu- Theory Informing Practice,
ation within organizations over time. Practice Informing Theory
A theory of action is at least partially
temporal in conceptualization because it Comparing Theories of Action
progresses from immediate objectives to
ultimate goals. Part of the test of a theory Much evaluation involves comparing
of action is the temporal logic of the hier- different programs to determine which is
archy. In causal language, it is impossible more effective or efficient. Evaluations can
for an effect or outcome to precede its be designed to compare the effectiveness of
cause. It is important, however, that tem- two or more programs with the same goal,
poral logic not become rigid. Once a pro- but if those goals do not bear the same
gram is in operation, the relationships be- importance in the two programs' theories
tween links in the causal hierarchy are of action, the comparisons may be mislead-
likely to be recursive rather than unidirec- ing. Before undertaking a comparative
tional. The implementation and attainment evaluation, it is useful to compare pro-
of higher-level objectives interact with the grammatic theories of action in order to
implementation and attainment of lower- understand the extent to which apparently
order objectives through feedback mecha- identical or similarly labeled programs are
nisms, interactive configurations, and cy- in fact comparable.
bernetic systems. Program components Programs with different goals cannot be
may be conceptually distinct in the formal fairly compared to each other on a unidi-
version of a theory of action, but in practice mensional basis. Teacher centers estab-
these analytically distinct components, lished to support staff development and
links, and stages are highly interdependent resource support for school teachers pro-
and dynamically interrelated. In short, the vide an example. The U.S. Office of Edu-
cause-effect relationships may be mutual, cation proposed that teacher centers be
multidirectional, and multilateral. For ex- evaluated according to a single set of uni-
ample, open classrooms affect the opinions versal outcomes. But evaluator Sharon
and actions of parents, but parent reactions Feiman (1977) found that teacher centers
also affect the degree of openness of class- throughout the country varied substantially
rooms; classroom climate and school cur- in both program activities and goals.
riculum affect student achievement, but Feiman described three types of teacher
variations in student achievement also af- centers: behavioral, humanistic, and devel-
fect school climate and curriculum. Once opmental. Exhibit 10.4 summarizes the
again, the means-ends distinction proves to variations among these types of centers.
be somewhat arbitrary and simplified, but Different teacher centers were trying to
there is no avoiding such simplification: accomplish different outcomes. Compari-
EXHIBIT 10.4
Variations in Types of Teacher Centers
Primary Process Primary Outcomes

Type of Center of Affecting Teachers of the Process
1. Behavioral centers Curriculum specialists directly and Adoption of comprehensive

formally instruct administrators and curriculum systems, methods,
teachers. and packages by teachers.
2. Humanistic centers Informal, nondirected teacher exploration; Teachers feel supported and
"teachers select their own treatment." important; pick up concrete
and practical ideas and
materials for immediate use
in their classroom.
3. Developmental centers Advisers establish warm, interpersonal, Teachers' thinking about what
and directive relationship with teachers they do and why they do it is
working with them over time. changed over time; teacher
personal development.
sons to determine which one was most Matching a Theory of Action

effective became problematic because they With Levels of Evidence
were trying to do different things. Evalu-
ation could help determine the extent to Claude Bennett (1982, 1979) has con-
which outcomes have been attained for ceptualized a relationship between the
each specific program, but empirical data "chain of events" in a program and the
could not determine which outcome was "levels of evidence" needed for evaluation.
most desirable. That is a values question. Although his work was aimed specifically
An evaluation facilitator can help users at evaluation of cooperative extension pro-
clarify their value premises, but because the grams (agriculture, home economics, and
three teacher-center models were different, 4-H/youth), his ideas are generally applica-
evaluation criteria for effectiveness varied ble to any program. Exhibit 10.5 depicts a
for each type. In effect, three quite different general adaptation of Bennett's model.
theories of teacher development were op- The model suggests a typical chain of
erating in quite different educational envi- program events:
ronments. Attention to divergent theories
of action helped avoid inappropriate com- 1. Inputs (resources) must be assembled to get
parisons and reframed the evaluation ques- the program started.
tion from Which model is best? to "What are 2. Activities are undertaken with available
the strengths and weaknesses of each ap- resources.
proach, and which approach is most effec- 3. Program participants (clients, students, ben-
tive for what kinds of educational environ- eficiaries) engage in program activities.
ments*'Very different evaluation questions! 4. Participants react to what they experience.
5. As a result of what they experience, changes ation, the purpose is to improve programs
in knowledge, attitudes, and skills occur (if and increase the quality of decisions made.
the program is effective). To accomplish this ultimate end, a chain
6. Behavior and practice changes follow knowl- of events must unfold.
edge and attitude change.
7. Overall impacts result, both intended and 1. Resources must be devoted to the evalua-
unintended. tion, including stakeholder time and finan-
cial inputs.
This model explicitly and deliberately 2. Working with intended users, important
places highest value on attaining ultimate evaluation issues are identified and ques-
social and economic goals (e.g., increased tions focused; based on those issues and
agricultural production, increased health, questions, the evaluation is designed and
and a higher quality of community life). data are collected.
Actual adoption of recommended prac- 3. Key stakeholders and primary users are in-
tices and specific changes in client behav- volved throughout the process.
iors are necessary to achieve ultimate 4. Intended users react to their involvement
goals and are valued over knowledge, at- (hopefully in positive ways).
titude, and skill changes. People may learn 5. The evaluation process and findings provide
about some new agricultural technique knowledge and new understandings.
(knowledge change), believe it's a good 6. Intended users interpret results, generate
idea (attitude change), and know how to and adopt recommendations, and use evalu-
apply it (skill change)—but the higher- ation results.
level criterion is whether they actually 7. The program improves and decisions are
begin using the new technique (i.e., made.
change their agricultural practices). Par-
ticipant reactions (satisfaction, likes, and
dislikes) are lower still on the hierarchy. Each step in this chain can be evaluated.
All of these are outcomes, but they are not Exhibit 10.6 shows the evaluation question
equally valued outcomes. The bottom that corresponds to each level in the utili-
part of the hierarchy identifies the means zation-focused theory of action hierarchy.
necessary for accomplishing higher-level
ends; namely, in descending order, (3) get- Logical Framework
ting people to participate, (2) providing
program activities, and (1) organizing ba- The Logical Framework Approach (Sar-
sic resources and inputs to get started. torius 1996a, 1991) offers a format for
connecting levels of impact with evidence.
Used widely by international development
agencies as a comprehensive map in design-
Utilization-Focused Evaluation ing projects, the framework begins by re-
Theory of Action quiring specification of the overall goal and
purposes of the project. Short-term outputs
Interestingly, this same hierarchy can be are linked logically to those purposes, and
applied to evaluating evaluations. Exhibit activities are identified that are expected to
10.6 shows a hierarchy of evaluation ac- produce the outputs. (The language of this
countability. In utilization-focused evalu- model can be confusing because what the
in
X
HI
m
cCD
:*. —-.
!±M
o o
•E •=«:
CD **-
-c o
O fc.
Theo
gram
w
p
otr
Z3
o
235
^_ Hierarchy Utilization Questions
o- o-
~i T3 -o
?; CD CD
at e xte
I o- §>
ell man
lerrn
the nalysis
_cz
5 E
(SU0I
CO
s
•o T3
cz
•o
cz CO
CZ c8
CO CO
CD CO CZ
T3 CO CO CO en cz
>
lesi'
ude
Ider
xte nt did intendec seoccur' Were re mmen
3101
o -CZ
E
CO.
5 c^- o T5 3=
.•= O- CD -£= ZJ
E o
o
s CO CZ
o
CJ
cz
CD
CD
-C CO
valuation
.*: *-
i key sta
e progra
did intended users leiam? Howwe re users
? releva
evaluati
dataw ere gathered? hatwiast e focus,

o
ghout?
lade?
ro
.c E CN. CD j>> if ZJ JZ: CD
> co
CO
<5
C ri
CD
S .cz
O CD
-CZ
xtent and in what ways w;
rmed, high-qualityF decisio
the eval ation's credibility? elieva
sfort
What do ntended users thii abou
sion makers in Ived t

olved? To wha xtent
happeined in the eval tion?

en
ZJ CD O CD
-£-° 5 § O
*- >
nt were resour
accurac ? potential utility?
ro
>
fficient?
iii
affected?
CD ZJ
CD O CD "— ZJ >, .ie

CO T3
•s
CD
<"
CD
ho wa
.a
imary
« ! •Jo
-CZ *— to
<3.i
.XZ ~
CO
*l
|2 5
CO
-C
CO
CD
5 5.
^0
sz £
«
* 8
c° £ ^ 55 ££
CQ o
_ u i-^ CO cri -* CO Cvi -^
_ u
1
<
X c
and attitude changes
in O
CO
ry intended users
3
ro
>
ill
«*~
o
>.
J=
o CD CO cz
u a>
CD
o
(0 CJ) E
k_
cz
CO
T3
CD O- CO
nowl
0) -CZ Q.
acts
tio s o f
CJ
'o
X Q.
E
CO ^L cz CO CO
ivitie
.E en "co CL
2 CD O CD
cz CO
o Q_ T3 TJ
CD "o
:isi
"O O DC O CO
cz .CZ -CZ
ca CD CD cz
CD • ^ ^1 o
actice
"a CO CO
T3
fo
CO C/) ZJ
cz
CO CO
LO CO
E
CO
Q_ LU >
CO
o> CO cvj ~ZJ
o ex
1. In
AqojBJ3]H UOIPV uoiiEn|BAg
236
logical framework calls a goal is what verifiable indicators, means of verification

other models more commonly call mission; (types of data), and important assumptions
and purposes are similar to objectives or about the linkage between activities and
outcomes; outputs are short-term, end- outputs, outputs to purposes, and pur-
of-project deliverables. For every goal, poses to goals. A software program called
purpose, output, and activity, the frame- PC/LogFRAMEe supports completing the
work requires specification of objectively logical framework (Sartorius 1996b).
Causal Theorizing in Perspective
• M ur least deed, like the young of the land crab, wends its way to the sea of cause
^^^^**^ and effect as soon as born, and makes a drop there to eternity.
—Thoreau [Journal, March 14, 1838)
While causal linkages may never be es- research, the initial theoretical formula-
tablished with certainty, the delineation of tions originate with primary stakeholders
assumed causal relationships in a chain of and intended users; scholarly interests are
hierarchical objectives can be a useful exer- adapted to the evaluation needs of relevant
cise in the process of focusing an evalu- decision makers, not vice versa.
ation. It is not appropriate to construct a Theory-driven evaluations can seduce
detailed theory of program action for every researchers away from answering straight-
evaluation situation, but it is important to forward formative questions or determin-
consider the option. Therefore, the skills of ing the merit or worth of a program into
a utilization-focused evaluation facilitator the ethereal world of academic theorizing.
include being able to help intended users In this regard, Scriven (1991b) asserts that
construct a means-ends hierarchy, specify theory testing is "a luxury for the evalu-
validity assumptions, link means to ends, ator." He considers it "a gross though fre-
and lay out the temporal sequence of a quent blunder to suppose that 'one needs a
hierarchy of objectives. theory of learning to evaluate teaching' "
Attention to theoretical issues can pro- (p. 360). One does not need to know any-
vide useful information to stakeholders thing at all about electronics, he observes,
when their theories are formulated and to evaluate computers.
reality-tested through the evaluation pro- On the other hand, a theory can be the key
cess. Theory construction is also a mecha- that unlocks the door to effective action.
nism by which evaluators can link particu- How much to engage stakeholders and
lar program evaluation questions to larger intended users in articulating their theories
social scientific issues for the purpose of of action is a matter for negotiation. Help-
contributing to scientific knowledge ing practitioners test their espoused theo-
through empirical generalizations. But in a ries and discover real theories-in-use can be
utilization-focused approach to evaluation a powerful learning experience, both indi-
vidually and organizationally. At a simpler a fully specified theory), evaluation data

level, without constructing a fully specified can seldom provide more than an approxi-
theory, the evaluator may pose basic causal mation of the likelihood of causal connec-
questions about the relationship between tions. It is important to interpret results
program activities and outcomes. Even for about causal linkages with prudence and
these more modest questions (more modest care. In that regard, consider the wisdom
in the sense that they don't involve testing of this Buddhist story.
One day an old man approached Zen Master Hyakujo. The old man said, "I am not
a human being. In ancient times I lived on this mountain. A student of the Way asked
me if the enlightened were still affected by causality. I replied saying that they were not
affected. Because of that, I was degraded to lead the life of a wild fox for five hundred
years. I now request you to answer one thing for me. Are the enlightened still affected
by causality?"
Master Hyakujo replied, "They are not deluded by causality."
At that the old man was enlightened.
—Adapted from Hoffman; 1775:1.:
Causal evaluation questions can be enlightening; they can also lead to delusions. Unfor-
tunately, there is no clear way of telling the difference. So among the many perils evaluators
face, we can add that of being turned into a wild fox for five hundred years!
Note
1. Reprinted from Angels in America, Part Two:
Perestroika by Tony Kushner. Copyright 1992
and 1994 by the author. Published by Theatre
Communications Group. Used by permission.
Appropriate Methods
Blowhard Evaluation
This is the story of three little pigs who built three little houses for protection from the
BIG BAD WOLF.
The first pig worked without a plan, building the simplest and easiest structure
possible with whatever materials happened to be laying around, mostly straw and sticks.
When the IUG BAD WOLF appeared, he had scarcely to huff and puff to blow the
house down, whereupon the first pig ran for shelter and protection to the second pig's
house.
The second pig's house was prefabricated in a most rigorous fashion with highly
reliable materials. Architects and engineers had applied the latest techniques and most
valid methods to the design and construction of these standardized, prefabricated
models. The second pig had a high degree of confidence that his house could withstand
any attack.
The BIG BAD WOLF followed the first pig to the house of the second pig and
commanded, "Come out! Come out! Or by the hair on my chinny-chin-chin, I'll huff
and I'll puff and I'll blow your house down."
The second pig laughed a scornful reply: "Huff and puff all you want. You'll find no
weaknesses in this house, for it was designed by experts using the latest and best scientific
methods guaranteed not to fall apart under the most strenuous huffing and puffing."
So the BIG BAD WOLF huffed and puffed, and he huffed and puffed some more, but
the structure was solid, and gave not an inch.
In catching his breath for a final huffing and puffing, the BIG BAD WOLF noticed
that the house, although strong and well built, was simply sitting on top of the ground.
It had been purchased and set down on the local site with no attention to establishing
a firm connecting foundation that would anchor the house in its setting. Different
settings require very different site preparation with appropriately matched foundations,
but the prefabricated kit came with no instructions about how to prepare a local
foundation. Understanding all this in an instant, the sly wolf ceased his huffing and
puffing. Instead, he confidently reached down, got a strong hold on the underside of the
house, lifted, and tipped it over. The second pig was shocked to find himself uncovered
and vulnerable. He would have been easy prey for the BIG BAD WOLF had not the first
pig, being more wary and therefore more alert, dashed out from under the house, pulling
his flabbergasted brother with him. Together they sprinted to the house of the third pig,
crying "wee wee wee" all the way there.
The house of the third pig was the source of some controversy in the local pig
community. Unlike any other house, it was constructed of a hodgepodge of local
materials and a few things borrowed from elsewhere. It incorporated some of the ideas
seen in the prefabricated houses designed by experts, but those ideas had been altered
to fit local conditions and the special interests and needs of the third pig. The house was
built on a strong foundation, well anchored in its setting and carefully adapted to the
specific conditions of the spot on which the house was built. Although the house was
sometimes the object of ridicule because it was unique and different, it was also the
object of envy and praise, for it was evident to all that it fit quite beautifully and
remarkably in that precise location.
The BIG BAD WOLF approached the house of the third pig confidently. He huffed
and puffed his best huffs and puffs. The house gave a little under these strenuous forces,
but it did not break. Flexibility was part of its design, so it could sway and give under
adverse and changed conditions without breaking and falling apart. Being firmly
anchored in a solid foundation, it would not tip over. The BIG BAD WOLF soon knew
he would have no pork chops for dinner that night.
Following the defeat of the BIG BAD WOLF, the third pig found his two brother pigs
suddenly very interested in how to build houses uniquely adapted to and firmly
grounded in a specific location with a structure able to withstand the onslaughts of the
most persistent blowhards. They opened a consulting firm to help other pigs. The firm
was called "Wee wee wee, all the way home."
—From Halcolm's "Evaluation Kairv Talcs"

Evaluations Worth Using
Utilization-Focused Methods Decisions
I hey say there was method to his madness. Perhaps so. It is easier to select a
\*^ method for madness than a single best method for evaluation, though attempt-
ing the latter is an excellent way of achieving the former.
—Halcolm
The three pigs story that precedes this The third pig, then, exemplifies the
chapter and introduces this part of the utilization-focused evaluator, one who de-
book on Appropriate Methods can be in- signs an evaluation to fit a specific set of
terpreted as an evaluation parable. The first circumstances, needs, and interests. The
pig built a house that was the equivalent of third pig demonstrated situational adapt-
what is disparagingly called a "quick and ability and responsiveness, a strategic
dirty evaluation." They are low-budget ef- stance introduced in Chapter 6. In this
forts that give the outward appearance of chapter, we'll examine how situational re-
evaluation, but their value and utility are sponsiveness affects methods decisions.
fleeting. They simply do not stand up under
scrutiny. The second pig replicated a high-
quality design that met uniform standards Methods to Support Intended
of excellence as specified by distant ex- Uses, Chosen by Intended Users
perts. Textbook designs have the advantage
of elegance and sophistication, but they In utilization-focused evaluation, meth-
don't travel well. Prefabricated structures ods decisions, like decisions about focus
brought in from far away are vulnerable to and priority issues, are guided and in-
unanticipated local conditions. Beware the formed by our evaluation goal: intended
evaluator who offers essentially the same use by intended users. Attaining this goal is
design for every situation. enhanced by having intended users actively
241
242 • APPROPRIATE METHODS
involving in methods decisions, an asser- level of statistical robustness, significance,

tion I shall substantiate in depth through- power, validity, reliability, generalizability,
out this chapter. It remains, however, a and so on—all technical terms that dazzle,
controversial assertion, evidence about its impress, and intimidate practitioners and
desirability and effectiveness notwithstand- nonresearchers. Evaluation researchers
ing. The source of the controversy, I'm have a vested interest in maintaining this
convinced, is territorial. technical image of scientific expertise, for
I've had some success persuading col- it gives us prestige, inspires respect, and,
leagues and students that use can be en- not incidentally, it leads nonresearchers to
hanced by actively involving intended users defer to us, essentially giving us the power
in decisions about the evaluation's pur- to make crucial methods decisions and then
pose, scope, and focus to ensure relevance interpret the meaning of the resulting data.
and buy-in. In other words, they can accept It is not in our interest, from the perspec-
playing a consultative and collaborative tive of maintaining prestige and power, to
role during the conceptual phase of the reveal to intended users that methods deci-
evaluation. Where we often part company sions are far from purely technical. But,
is in the role to be played by intended users contrary to public perception, evaluators
in making measurement and design deci- know that methods decisions are never
sions. "The evaluator is nothing," they ar- purely technical. Never. Ways of measuring
gue, "if not an expert in methods and sta- complex phenomena involve simplifica-
tistics. Clearly social scientists ought to be tions that are inherently somewhat arbi-
left with full responsibility for operation- trary, are always constrained by limited
alizing program goals and determining resources and time, inevitably involve com-
data collection procedures." Edwards peting and conflicting priorities, and rest
and Guttentag (1975) articulated the clas- on a foundation of values preferences that
sic position, one that I find still holds sway are typically resolved by pragmatic consid-
today: "The decision makers' values deter- erations, disciplinary biases, and measure-
mine on what variables data should be ment traditions.
gathered. The researcher then decides how The only reason to debunk the myth that
to collect the data" (p. 456). methods and measurement decisions are
Utilization-focused evaluation takes a primarily technical is if one wants to en-
different path. hance use. For we know that use is en-
hanced when practitioners, decision mak-
ers, and other users fully understand the
Beyond Technical Expertise strengths and weaknesses of evaluation
data, and that such understanding is in-
The common perception of methods de- creased by being involved in making meth-
cisions among nonresearchers is that such ods decisions. We know that use is en-
decisions are primarily technical in nature. hanced when intended users participate in
Sample size, for example, is determined by making sure that, when trade-offs are con-
a mathematical formula. The evaluation sidered, as they inevitably are because of
methodologist enters the values of certain limited resources and time, the path chosen
variables, makes calculations, and out pops is informed by relevance. We know that use
the right sample size to achieve the desired is enhanced when users buy into the design
Evaluations Worth Using • 243
EXHIBIT 11.1
Reasons Primary Users Should
Be Involved in Methods Decisions
1. Intended use affects methods choices. Intended users can and should judge the utility of various design
options and kinds of data.
2. Limited time and resources necessitate trade-offs—more of this, less of that. Primary users have the
greatest stake in such decisions since findings are affected.
3. Methods decisions are never purely technical. Practical considerations constrain technical alternatives.
Everything from how to classify participants to how to aggregate data has utility implications that deserve
users' consideration.
4. No design is perfect. Intended users need to know the strengths and weaknesses of an evaluation to
exercise informed judgment.
5. Different users may have different criteria for judging methodological quality. These should be made
explicit and negotiated during methods discussions.
6. Credibility of the evidence and the perceived validity of the overall evaluation are key factors affecting
use. These are matters of subjective user judgment that should inform methods decisions.
7. Intended users learn about and become more knowledgeable and sophisticated about methods and using
data by being involved in methods decisions. This benefits both the current and future evaluations.
8. Methods debates should take place before data collection, as much as possible, so that findings are not
undercut by bringing up concerns that should have been addressed during design. Methods debates
among intended users after findings are reported distract from using evaluation results.
and find it credible and valid within the out the consequences of various choices;
scope of its intended purposes as deter- offers creative possibilities; engages with
mined by them. And we know that when users actively, reactively, and adaptively to
evaluation findings are presented, the sub- consider alternatives; and facilitates their
stance is less likely to be undercut by de- methods decisions. At the stage of choosing
bates about methods if users have been methods, the evaluator remains a technical
involved in those debates prior to data adviser, consultant, and teacher. The pri-
collection. mary intended users remain decision mak-
As in all other aspects of the evaluation, ers about the evaluation. Exhibit 11.1 sum-
then, the utilization-focused evaluator ad- marizes reasons why primary intended
vises intended users about options; points users should be involved in methods deci-
sions. In the pages that follow, I'll elaborate age. As the day of the march approached,
on these rationales, explore the implica- the central question became: How many
tions of this approach, and provide exam- will show up?
ples. Let's begin with an example. Why was the number so important? Be-
cause the target number became the name
of the march: The Million Man March.
The Million Man March The goal was unusually clear, specific, and
measurable. The march's leaders staked
On October 16, 1995, some number of their prestige on attaining that number.
African American men marched on Wash- The march's detractors hoped for failure.
ington, D.C., as a call to action. The num- The number came to symbolize the unity
ber of men in the march mattered a great and political mobilization of African Amer-
deal to both its organizers and critics. Dis- ican men.
putes about the number subsequently led to In time for the evening news on the day
major lawsuits against the National Park of the march, the National Park Service
Service, which provides the government's released its estimate: 400,000. This ranked
official estimates of demonstrations on the the march as one of the largest in the
Capitol Mall. For weeks after the march, history of the United States, but the number
newspaper commentators, television jour- was far short of the 1 million goal. March
nalists, policymakers, activists, academics, advocates reacted with disbelief and anger.
and pundits debated the number. The size March critics gloated at Farrakhan's "fail-
of the march overshadowed its substance ure." Who made the estimate? A white
and intended message. Varying estimates of man, a career technician, in the National
the number of marchers led to charges and Park Service. He used the method he al-
countercharges of racism and bigotry. ways used, a sample count from photo-
Could this controversy have been antici- graphs. Leaders of the march immediately
pated, avoided, or at least tempered? Let's denounced the official number as racist.
consider how the evaluation was con- The debate was on. A week later, inde-
ducted and then how a utilization-focused pendent researchers at Boston University,
approach would have been different. using different counting methods, esti-
First, let's examine what made this mated the number at 800,000—double the
march a focus for evaluation. The organ- National Park Service estimate. Others
izer of the march, Nation of Islam leader came in with other estimates. The leaders
Louis Farrakhan, was a controversial figure of the march continued to insist that more
often accused of being anti-Semitic and than a million participated. The signifi-
fomenting hatred against whites. Some cance of this historically important event
Black congressmen and the leadership of remains clouded by rancorous debate over
the National Association for the Advance- the seemingly simplest of all evaluation
ment of Colored People (NAACP) refused questions: How many people participated
to join the march. Many other Black lead- in the "program"?
ers worked to make it a success. From the Suppose, now, for the sake of illustra-
moment the march was announced, tion, that the responsible National Park
through the months leading up to it, debate official—a white male, remember—had
about the legitimacy and purpose of the taken a utilization-focused approach. As
march received high-visibility media cover- the visibility of the march increased leading
up to it, and as its potential historical sig- 5. What geographical boundary gets included
nificance became apparent, he could have in the count? What are the boundaries of
identified and convened a group of primary the Capitol Mall for purposes of sampling?
stakeholders: one or more representatives 6. Sympathy and support marches are sched-
of the march's organizers, representatives uled to take place in other cities. Do their
of other national Black organizations, aca- numbers count in the one million total?
demics with expertise in crowd estimates, 7. Should we report a single number, such
and perhaps police officials from other cit- as 1 million, or communicate the variability
ies who had experience estimating the size of any such count by reporting a range, for
of large crowds. A couple of respected example 900,000 to 1.1 million?
newsprint and television journalists could 8. Who are the most credible people to actually
have been added to the group. Indeed, and engage in or supervise the actual analysis?
this is surely a radical proposal, a profes- 9. What reviews should the analysis undergo,
sional evaluator might have been asked to by whom, before being released officially?
facilitate the group's work. 10. Who do we say determined the counting
Once such a group was assembled, con- methods and under whose name, or com-
sider the challenging nontechnical deci- bination of named sponsors, should the
sions that have to be made to figure out result be publicized?
the size of the march. These questions
I certainly don't assert that convening
are in addition to technical questions of
a group of primary stakeholders to nego-
aerial photography sampling and computer
tiate answers to these questions would
programs designed to count heads in a
have ended all controversy, but I do be-
crowd. To answer these questions re-
lieve it could have tempered the rancor-
quires some combination of common
ous tone of the debate, diffused the racial
sense, political savvy, appreciation of dif-
overtones of the counting process, and
ferent perspectives, and pragmatism. Here,
permitted more focus on the substantive
then, are some questions that would occur
societal issues raised by the march, issues
to me if I had been asked to facilitate such
about family values, community involve-
a discussion:
ment, social responsibility, economic op-
portunity, and justice. The evaluation task
1. Who gets counted? It's the million man force, once convened to decide how to
march aimed at Black men. Do women count from one to one million, might even
count? Do children count? Do whites count? have decided to prepare methods of fol-
2. Do spectators and onlookers get counted as lowing up the march to determine its
well as marchers? longer term impacts on Black men, fami-
3. When during the daylong event will counts lies, and communities—evaluation ques-
be made? Is there one particular time that tions overshadowed by the controversy
counts the most; for example, Farrakhan's about the number of participants.
speech? (His speech was three hours long, so
when or how often during his speech?)
4. Should the final number account for people Parallel Evaluation Decisions
who came and went over the course of the
day, or only people present at some single I like the Million Man March example
point in time? because it shows how complex a simple
question like "how many" can become. Nor are these kinds of categorical deci-
Parallel complexities can be found in any sions only a problem when measuring hu-
program evaluation. For example, in most man behavior. The Minnesota Department
programs the dropout rate is an important of Transportation categorizes road projects
indicator of how participants are reacting as preservation, replacement, and new or
to a program. But when has someone expansion. How dollars are allocated and
dropped out? This typically turns out to distributed in these three categories to re-
involve some arbitrary cutoff. School dis- gions throughout the state has enormous
tricts vary widely on how they define, implications. Now, consider the Lake Street
count, and report dropouts, as do chemical Bridge, which connects Minneapolis and
dependency, adult literacy, parent educa- Saint Paul. Old and in danger of being
tion, and all kinds of other programs. condemned, the bridge was torn down and
No less vague and difficult are concepts a new one built. The old bridge had only
like in the program and finished the pro- two lanes and no decorative flourishes. The
gram. Many programs lack clear beginning new bridge has four lanes and attractive
and ending points. For example, a job train-
design features. Should this project be cate-
ing program aimed at chronically unem-
gorized as replacement or expansion? (In a
ployed minority men has a monthlong
time of economic optimism and expanding
assessment process, including testing for
resources, such as the 1960s, new and ex-
drug use and observing a potential partici-
pansion projects were favored. In a time of
pant's persistence in staying with the pro-
downsizing and reduced resources, like the
cess. During this time, the participant, with
1990s, replacement projects are more po-
staff support and coaching, develops a
litically viable.) Perhaps, you might argue,
plan. The participant is on probation until
the Lake Street Bridge illustrates the need
he or she completes enough of the program
for a new category: Part replacement/part
to show seriousness and commitment, but
expansion. But no replacements are pure
the program is highly individualized so dif-
replacements when new materials are used
ferent people are involved in the early as-
and updated codes or standards are fol-
sessment and probation processes over very
different time periods. There is no clear lowed. And few expansions are done with-
criterion for when a person has begun pro- out replacing something. How much mix,
bation or completed probation and offi- then, would have to occur for a project
cially entered the program. Yet, that deci- to fall into the new, combined part re-
sion, in aggregate, will determine the placement/part expansion category? A
denominator for dropout and completion doctoral degree in research and statistics
rates and will be the numerator for the provides no more guidance in answering
program's "acceptance" rate. Making sure this question than thoughtful considera-
that such categories are meaningful and tion of how the data will be used, grounded
valid, so that the numbers are credible and in common sense and pragmatism—a
useful, involves far more than statistics. decision that should be made by intended
Careful thought must be given, with pri- users with intended uses in mind. Such
mary intended users, to how the numbers inherently arbitrary measurement deci-
and reported rates will be calculated and sions determine what data will emerge in
used. findings.
Methods and Measurement Options
I here can be acting or doing of any kind, till it be recognized that there is a thing
\ ^ ^ to be done; the thing once recognized, doing in a thousand shapes becomes
possible.
—Thomas Carlyle,
philosopher and historian
(1795-1881)
Mail questionnaires, telephone inter- holders with a thousand options, but I do

views, or personal face-to-face interviews? expect to work with them to consider the
Individual interviews or focus groups? strengths and weaknesses of major design
Even-numbered or odd-numbered scales and measurement possibilities.
on survey items? Opinion, knowledge, The primary focus in making evaluation
and/or behavioral questions? All closed methods decisions should be on getting the
questions or some open-ended? If some best possible data to adequately answer
open-ended, how many? Norm-referenced primary users' evaluation questions given
or criterion-referenced tests? Develop our available resources and time. The emphasis
own instruments or adopt measures al- is on appropriateness and credibility —
ready available? Experimental design, measures, samples, and comparisons that
quasi-experimental design, or case studies? are appropriate and credible to address key
Participant observation or spectator obser- evaluation issues. The Joint Committee's
vation? A few in-depth observations or (1994) evaluation standards provide the
many shorter observations? Single or mul- following guidance:
tiple observers? Standardized or individu-
alized protocols? Fixed or emergent de- Utility Standard on
sign? Follow up after two weeks, three Information Scope and Selection
months, six months, or a year? Follow up Information collected should be broadly
everyone or a sample? What kind of sam- selected to address pertinent questions about
ple: simple random, stratified, and/or pur- the program and be responsive to the needs
poseful? What size sample? Should inter- and interests of clients and other specified
viewers have the same characteristics as stakeholders (U3).
program participants: gender? age? race?
What comparisons to make: past perfor- In my judgment, the best way to ensure
mance? intended goals? hoped-for goals? pertinence and responsiveness is through
other programs? I won't list a thousand direct interaction with evaluation clients
such options a la Thomas Carlyle, but I've and primary stakeholders, facilitating
no doubt it could be done. I would certainly their making decisions to represent their
never try the patience of primary stake- needs and interests.
Assuring Methodological Quality and Excellence
a am easily satisfied with the very best.

—Winston Churchill,
British prime minister during
World War II (1874-1965)
One of the myths believed by nonre- told me that in several years as editor, she
searchers is that researchers have agreed has never published an article on which all
among themselves about what constitutes three reviewers agreed the article was good!
methodological quality and excellence. I edited the peer-reviewed Journal of Ex-
This belief can make practitioners and tension for three years and had the same
other nonacademic stakeholders under- experience. Robert Donmoyer (1996), new
standably reluctant to engage in methods features editor of Educational Researcher,
discussions. In fact, researchers disagree reported that "peer reviewers' recommen-
with each other vehemently about what dations often conflict and their advice is
constitutes good research and, with a little frequently contradictory.. . . There is little
training and help, I find that nonresearch- consensus about what research and schol-
ers can grasp the basic issues involved and arship are and what research reporting and
make informed choices. scholarly discourse should look like" (p. 19).
To increase the confidence of nonre- This kind of inside look at the world of
searchers that they can and should contrib- research can be shocking to people who
ute to methods discussions—for example, think that there surely must be consensus
to consider the merits of telephone inter- regarding what constitutes "good" re-
views versus face-to-face interviews or mail search. The real picture is more chaotic and
questionnaires—I'll often share the results warlike, what Donmoyer (1996) portrays
of research on how researchers rate re- as "a diverse array of voices speaking from
search quality. In a seminal study for the quite different, often contradictory per-
National Science Foundation, McTavish spectives and value commitments" (p. 19).
et al. (1975) used eminent social scientists Perspectives and value commitments? Not
to judge and rate the research quality of just rules and formulas? Perspectives and
126 federal studies. They found "impor- value commitments imply stakes, which
tant and meaningful differences between leads to stakeholders, which leads to in-
raters in their professional judgments about volving stakeholders to represent their
a project's methodology" (p. 63). Eva stakes, even in methods decisions, or
Baker, Director of the UCLA Center for the should we say, especially in methods deci-
Study of Evaluation and former editor of sions, then those decisions determine what
Educational Evaluation and Policy Analysis findings will be available for interpretation
(EEPA), established a strong system of peer and use.
review for EEPA, requiring three inde- The evidence of dissensus about re-
pendent reviewers for every article. Eva has search standards and criteria for judging
quality will not surprise those inside sci- they will assure that the information ob-
ence who understand that a major thrust of tained is sufficiently reliable for the intended
methodological training in graduate school use. (A6; emphasis added)
is learning how to pick apart and attack any
study. There are no perfect studies. And
there cannot be, for there is no agreement The Art of Making
on what constitutes perfection. Methods Decisions
This has important implications for
methods decisions in evaluation. There are Lee J. Cronbach (1982), an evaluation
no universal and absolute standards for pioneer and author of several major books
judging methods. The consensus that has on measurement and evaluation, observed
emerged within evaluation, as articulated that designing an evaluation is as much art
by the Joint Committee on Standards as science: "Developing an evaluation is an
(1994) and the American Evaluation Asso- exercise of the dramatic imagination"
ciation's Guiding Principles (Shadish et al. (p. 239). This metaphor, this perspective,
1995) is that evaluations are to be judged can help free practitioners and other pri-
on the basis of appropriateness, utility, mary users who are nonresearchers to feel
practicality, accuracy, propriety, credibility, they have something important to contrib-
and relevance. These criteria are necessar- ute. It can also, hopefully, open the evalu-
ily situational and context bound. One can- ator to hearing their contributions and fa-
not judge the adequacy of methods used in cilitating their "dramatic imaginations."
a specific evaluation without knowing the The art of evaluation involves creating a
purpose of the evaluation, the intended design that is appropriate for a specific
uses of the findings, the resources available, situation and particular action or policy-
and the trade-offs negotiated. Judgments making context. In art there is no single,
about validity and reliability, for example, ideal standard. Beauty is in the eye of the
are necessarily and appropriately relative beholder, and the evaluation beholders in-
rather than absolute in that the rigor and clude decision makers, policymakers, pro-
quality of an evaluation's design and mea- gram managers, practitioners, participants,
surement depend on the purpose and in- and the general public. Thus, for Cronbach
tended use of the evaluation. The Accuracy (1982), any given design is necessarily an
Standards of the Joint Committee on interplay of resources, possibilities, creativ-
Standards (1994) make it clear that validity ity, and personal judgments by the people
and reliability of an evaluation depend on involved. "There is no single best plan for
the intended use(s) of the evaluation. an evaluation, not even for an inquiry into
a particular program, at a particular time,
Valid Information: The information-gathering with a particular budget" (p. 231).
procedures should be chosen or developed
and then implemented so that they will as-
sure that the interpretation arrived at is valid Hard Versus Soft Data
for the intended use. (A5; emphasis added)
The next chapter will explore in depth
Reliable Information: The information- the "paradigms debate" involving quantita-
gathering procedures should be chosen or tive/experimental methods versus qualita-
developed and then implemented so that tive/naturalistic approaches. This is some-
times framed as "hard data" versus "soft important for evaluation use should be in-
data." At this point it suffices to say that the vited to participate in the evaluation design
issue is not hard versus soft, but relevant task force and become, explicitly, intended
and appropriate versus irrelevant and inap- users. Making no pretense of pleasing the
propriate. Participants in the Stanford entire scientific community (an impossibil-
Evaluation Consortium (Cronbach et al. ity), utilization-focused evaluation strives
1980) observed that "merit lies not in form to attain the more modest and attainable
of inquiry but in relevance of information" goal of pleasing primary intended users.
(p. 7). My experience with stakeholders This does not mean that utilization-focused
suggests that they would rather have soft evaluations are less rigorous. It means the
data about an important question than hard criteria for judging rigor must be articu-
data about an issue of minor relevance. lated for each evaluation.
Obviously, the ideal is hard data about
important questions, whatever hard data
may mean in a particular context. But, in Credibility and Use
the real world of trade-offs and negotia-
tions, the evaluation researcher too often Credibility affects use. Credibility is a
determines what is evaluated according to complex notion that includes the perceived
his or her own expertise or preference in accuracy, fairness, and believability of the
what to measure, rather than by deciding evaluation and the evaluator. In the Joint
first what intended users determine is Committee's (1994) standard on Evaluator
worth evaluating and then doing the best Credibility, evaluators are admonished to
he or she can with methods. Relevance and be "both trustworthy and competent" so
utility are the driving forces in utilization- that findings achieve "maximum credibility
focused evaluation; methods are employed and acceptance" (p. U2). Report clarity,
in the service of relevance and use, not as full and frank disclosure of data strengths
their master. and weaknesses, balanced reporting, defen-
One implication of this perspective— sible information sources, valid and reliable
that quality and excellence are situational, measurement, justified conclusions, and
that design combines the scientific and impartial reporting are all specific stan-
artistic—is that it is futile to attempt to dards aimed at credibility as a foundation
design studies that are immune from meth- for use. The American Evaluation Associa-
odological criticism. There simply is no tion's Guiding Principles (Shadish et al.
such immunity. Intended users who partici- 1995) likewise emphasize systematic in-
pate in making methods decisions should quiry, competence, and honesty and integ-
be prepared to be criticized regardless of rity to ensure credibility and utility.
what choices they make. Especially futile is For information to be useful and to
the desire, often articulated by nonre- merit use, it should be as accurate and
searchers, to conduct an evaluation that believable as possible. Limitations on the
will be accepted by and respected within degree of accuracy should be stated clearly.
the academic community. As we demon- Research by Weiss and Bucuvalas (1980)
strated above, in discussing peer review found that decision makers apply both
research, the academic community does truth tests (whether data are believable and
not speak with one voice. Any particular accurate) and utility tests (whether data are
academics whose blessings are particularly relevant) in deciding how seriously to
weigh findings. Decision makers want ronment of evaluation, these traditional

highly accurate and trustworthy data. This scientific concepts have taken on some new
means they want data that are valid and and broader meanings.
reliable. But in the politically charged envi-
Overall Evaluation Validity
I he government ministries are very keen on amassing statistics. They collect them,
\ - ^ raise them to the nth power, take the cube root, and prepare wonderful diagrams.
But you must never forget that every one of these figures comes in the first place from the
village watchman, who just puts down what he damn well pleases.
—Sir Josiah Stamp, 1911,
English economist (1880-1941)
House (1980:249) has suggested that ers' perceptions of and experiences with
validity means "worthiness of being recog- the program being evaluated, users' prior
nized:" For the typical evaluation this knowledge and prejudices, the perceived
means being "true, credible, and right" adequacy of evaluation procedures, and the
(p. 250). Different approaches to evalu- users' trust in the evaluator (Alkin et al.
ation establish validity in different ways. 1979:245-47). Trust, believability, and credi-
The important part of House's contribu- bility are the underpinnings of overall eval-
tion from the point of view of utilization- uation validity.
focused evaluation is that he applies the It is important to understand how over-
notion of validity to the entire evaluation, all evaluation validity differs from the
not just the data. An evaluation is perceived usual, more narrow conception of validity
as valid in a global sense that includes the in scientific research. Validity is usually fo-
overall approach used, the stance of the cused entirely on data collection proce-
evaluator, the nature of the process, the dures, design, and technical analysis, that
design, data gathering, and the way in is, whether measures were valid or whether
which results are reported. Both the evalu- the design allows drawing inferences about
ation and the evaluator must be perceived causality (internal design validity).
as trustworthy for the evaluation to have A measure is scientifically valid to the
high validity. extent that it captures or measures the con-
Alkin et al. (1979) studied use and cept (or thing) it is intended to measure.
found that "for evaluations to have impact, For example, asking if an IQ test really
users must believe what evaluators have to measures native intelligence (rather than
say" (p. 245). The believability of an evalu- education and socioeconomic advantage) is
ation depends on much more than the per- a validity question. Validity is often difficult
ceived scientific validity of the data and to establish, particularly for new instru-
findings. Believability depends on the us- ments. Over time, scientists develop some
252 • APPROPRIATE M E T H O D S
consensus about the relative validity of the flexibility, insight, and ability to build on
often-used instruments, such as major tacit knowledge that is the peculiar province
norm-referenced standardized educational of the human instrument. (Guba and Lincoln
tests. Rossi, Freeman, and Wright (1979) 1981:113)
discuss three c o m m o n criteria for validity
of quantitative instruments. Validity concerns also arise in using
official statistics such as health or crime
1. Consistency with usage: A valid measure- statistics. Joe H u d s o n (1977) has cau-
ment of a concept must be consistent with tioned about the care that must be taken
past work that used that concept. Hence, a in using crime statistics because of validity
measure of adoption must not be in contra- problems:
diction to the usual ways in which that term
had been used in previous evaluations of First, officially collected information used as
interventions. measures of program outcomes are, by their
2. Consistency with alternative measures: A very nature, indirect measures of behavior.
valid measure must be consistent with alter- For example, we have no practical or direct
native measures that have been used effec- way of measuring the actual extent to which
tively by other evaluators. Thus, a measure graduates of correctional programs commit
must produce roughly the same results as new crimes. Second, the measurements
other measures that have been proposed, or, provided are commonly open to serious
if different, have sound conceptual reasons problems. For example, the number of
for being different. crimes known to authorities in most situ-
3. Internal consistency: A valid measure must ations is only a fraction of the number of
be internally consistent. That is, if several crimes committed, although that fraction
questions are used to measure adoption, the varies from crime to crime. . . . The growing
answers to those questions should be related willingness of victims of sexual assault to
to each other as if they were alternative report their crimes to the police and actively
measures of the same thing, (pp. 170-71) cooperate in prosecution is an example of
the manner in which public attitudes can
Qualitative data collection (e.g., such affect officially recorded rates of crime.
techniques as participant observation and Of the various criteria used to measure
in-depth, o p e n - e n d e d interviewing) poses recidivism, that of arrest appears to be espe-
different validity challenges. In qualitative cially problematic. Recidivism rates based on
m e t h o d s , validity hinges to a greater ex- arrest do not tell us whether those arrested
tent on the skill, competence, and rigor of have, in fact, returned to criminal behavior
the researcher because the observer or but only that they are presumed to have done
interviewer is the instrument. so.. . .
The widespread discretion exercised by
Since as often as not the naturalistic inquirer the police to arrest is a further source of
is himself the instrument, changes resulting invalidity. For example, it is probably reason-
from fatigue, shifts in knowledge, and co- able to expect that the number of individuals
optation, as well as variations resulting from arrested for a particular type of crime within
differences in training, skill, and experience a jurisdiction is to some extent a direct re-
among different "instruments," easily occur. flection of changing police policies and not
But this loss in rigor is more than offset by totally the function of changing patterns of
law-violating behavior. In addition to the 1970:149). An instrument has face validity

power of deciding when to arrest, police also if stakeholders can look at the items and
have discretionary authority to determine understand what is being measured. Face
which of a number of crimes an individual validity, however, is generally held in low
will be arrested for in a particular situation. regard by measurement experts. Predic-
Thus, if policy emphasis is placed upon com- tive validity, concurrent validity, construct
bating burglary, this may affect decisions as validity—these technical approaches are
to whether an arrestee is to be arrested for much preferred by psychometricians.
burglary, simple larceny, or criminal damage Nunnally (1970) considers face validity to
to property. In short, the discretion of the have occasional public relations value
police to control both the number and types when data are gathered for the general
of arrests raises serious validity problems in public: "Less logical is the reluctance of
evaluations which attempt to use this mea- some administrators in applied settings,
sure of program outcome, (pp. 88-89) e.g., industry, to permit the use of predictor
instruments which lack face validity"
In summary, then, validity problems, (p. 149). Yet, from a utilization perspec-
along with the trustworthiness of the tive, it is perfectly logical for decision mak-
evaluator, affect the overall credibility of ers to want to understand and believe in
the evaluation, and this is true for all kinds data they are expected to use. Nunnally
of data collection—quantitative mea- disagrees: "Although one could make a case
sures, questionnaires, qualitative observa- for the involvement of face validity in the
tions, government statistics, and social in- measurement of constructs, to do so would
dicators. The precise nature of the validity probably serve only to confuse the issues"
problem varies from situation to situation, (p. 150). It is little wonder that evaluators,
but evaluators must always be concerned many of whom cut their measurement teeth
about the extent to which the data col- on Nunnally's textbooks, have little sympa-
lected are credible and actually measure thy for the face validity needs of stakehold-
what is supposed to be measured; they ers. Nor is it surprising that such evaluators
must also make sure that intended users complain that their findings are not used.
understand validity issues. In addition, a Consider the following case.
validity issue of special, though not
The board of directors of a major indus-
unique, concern to utilization-focused
trial firm decided to decentralize organiza-
evaluators is face validity.
tional decision making in hopes of raising
worker morale. The president of the com-
pany hired an organizational consultant to
Believable and monitor and evaluate the decentralization
Understandable Data program and its effects. From the literature
on the sociology of organizations, the
Face Validity in Utilization- evaluator selected a set of research instru-
Focused Measurement ments designed to measure decentraliza-
tion, worker autonomy, communication
Face validity concerns "the extent to patterns, and worker satisfaction. The
which an instrument looks as if it measures scales had been used by sociologists to
what it is intended to measure" (Nunnally measure organizational change in a number
of different settings, and the factorial com- decentralization program had failed and
position of the scales had been validated. that worker morale remained low. The
The instruments had high predictive and president of the company had a consider-
construct validity, but low face validity. able stake in the success of the program; he
The evaluator found no statistically sig- did not have a stake in the evaluation data.
nificant changes between pretest and post- He did what decision makers frequently do
test so, when he met with the board of in such cases—he attacked the data.
directors, he dutifully reported that the
President: How can you be so sure that the program failed?

Evaluator: We collected data using the best instruments available. I won't go
into all the technical details of factor analysis and Cronbach's alpha.
Let me just say that these scales have been shown to be highly valid
and reliable. Take this 10-item scale on individual autonomy. The
best predictor item in this particular scale asks respondents: (a) "Do
you take coffee breaks on a fixed schedule?" or (b) "Do you go to
get coffee whenever you want to?"
President: [visibly reddening and speaking in an angry tone] Am I to under-
stand that your entire evaluation is based on some kind of question-
naire that asks people how often they get coffee, that you never
personally talked to any workers or managers, that you never even
visited our operations? Am I to understand that we paid you
$20,000 to find out how people get their coffee?
Evaluator: Well, there's a lot more to it than that, you see . . .
President: That's it! We don't have time for this nonsense. Our lawyers will be
in touch with you about whether we want to press fraud and
malpractice charges!
Clearly the president was predisposed Such an exchange might not have made
to dismiss any negative findings. But sup- a difference. It's not easy to get busy ex-
pose the evaluator had reviewed the in- ecutives to look carefully at instruments
strument and survey design with the presi- in advance, nor do evaluators want to
dent before gathering data. Suppose he waste time explaining their trade. Many
had explained what the items were sup- decision makers are just as happy not
posed to indicate and then asked, being bothered with technical decisions.
After all, that's why they hired an
evaluator in the first place, to design
Now, if we survey employees with these and conduct the evaluation! But the
items measuring these factors, will they tell costs of such attitudes to use can be high.
you what you want to know? Does this make Utilization-focused evaluators check out
sense to you? Are you prepared to act on this the face validity of instruments before
kind of data? Would you believe the results data are collected. Subsequent data analy-
if they came out negative? sis, interpretation and use are all facili-
tated by attention to face validity—mak- The reason: "If you had presented us with
ing sure users understand and believe in opinions from at least a thousand people,
the data. we might be able to move on this item. But
we can't make a major capital commit-
ment on the basis of a couple of hundred
Useful Designs interviews."
The marketing director tactfully tried to
Face validity criteria can also be applied explain that increased sample size would
to design questions. Do intended users un- have made only a marginal reduction in
derstand the design? Does it make sense to possible sampling error. The chairperson
them? Do they appreciate the implications remained unconvinced, the findings of an
of comparing Program A with Program B? expensive research project were ignored,
Do they know why the design includes, or and the company missed out on a major op-
does not include, a control group? Is the portunity. A year later, the item they re-
sample size sufficiently large to be believ- jected had become a fast-selling new prod-
able? You can be sure that decision makers uct for a rival company.
will have opinions about these issues when It is easy to laugh at the board's mistake,
results are presented, particularly if find- but the marketing director was not laugh-
ings turn out negative. By asking these ing. He wanted to know what to do. I
questions before data collection, potential suggested that next time, he check out the
credibility problems can be identified and research design with the board before col-
dealt with, and users' insights can help lecting data, going to them and saying,
shape the design to increase its relevance.
Consider the following case. Our statistical analysis shows that a sample
At an evaluation workshop I conducted, of 285 respondents in the Twin Cities area
the marketing director for a major retail will give us an accurate picture of market
merchandising company attended to find potential. Here are the reasons they recom-
out how to get more mileage out of his mend this sample size. . . . Does that make
marketing research department. He told sense to you? If we come in with a new
this story. product recommendation based on 285 re-
Two years earlier he had spent a consid- spondents, will you believe the data?
erable sum researching the potential for
new products for his company's local retail If the board responds positively, the
distribution chain. A carefully selected rep- potential for use will have been enhanced,
resentative sample of 285 respondents had though not guaranteed. If the board says
been interviewed in the Minneapolis-Saint the sample is too small, then the survey
Paul greater metropolitan area. The results might as well include more respondents—
indicated one promising new line of prod- or be canceled. There is little point in
ucts for which there appeared to be grow- implementing a design that is known in
ing demand. He took this finding to the advance to lack credibility.
board of directors with a recommendation
that the company make a major capital Reliability and Error
investment in the new product line. The
board, controlled by the views of its aging Reliability has to do with consistency. A
chairman, vetoed the recommendation. measure is reliable to the extent that essen-
tially the same results can be reproduced Sources of error are many. For exam-
repeatedly, as long as the situation does not ple, consider sources of error in an indi-
change. For example, in measuring the vidual test score. Poor health on the day
height of an adult, one should get the same of the test can affect the score. Whether
results from one month to the next. Mea- the student had breakfast can make a dif-
suring attitudes and behavior is more com- ference. Noise in the classroom, a sudden
plex because one must determine whether fire drill, whether or not the teacher or a
measured change means the attitude has stranger gives the test, a broken pencil,
changed or the data collection is unreliable. and any number of similar disturbances
Inconsistent data collection procedures, can change a test score. The mental state
for example, asking interview questions in of the child—depression, boredom, ela-
different sequence to different respon- tion, a conflict at home, a fight with an-
dents, can change results and introduce other student, anxiety about the test, a
errors. Nonresearchers will often have un- low self-concept—can affect how well the
realistic expectations about evaluation in- student performs. Simple mechanical er-
struments, expecting no errors. For many rors such as marking the wrong box on the
reasons, all data collection is subject to test sheet by accident, inadvertently skip-
some measurement error. Henry Dyer, a ping a question, or missing a word while
former president of the highly respected reading are common problems for all of
Educational Testing Service (ETS), tells of us. Students who have trouble reading will
trying to explain to a government official perform poorly on reading tests, but they
that test scores, even on the most reliable are also likely to perform poorly on social
tests, have enough measurement error that studies, science, and math tests.
they must be used with understanding of Some children perform better on tests
their limitations. The high-ranking official because they have been taught how to take
responded that test makers should "get on written tests. Some children are simply bet-
the ball" and start producing tests that "are ter test takers than other children because
100% reliable under all conditions." of their background or personality or be-
Dyer's (1973) reflections on this conver- cause of how seriously they treat the idea
sation are relevant to an understanding of of the test. Some schools make children sit
error in all kinds of measures. He asked, all day taking test after test, sometimes for
an entire week. Other schools give the test
for only a half-day or two hours at a time
How does one get across the shocking truth to minimize fatigue and boredom. Some
that 100% reliability in a test is a fiction that, children like to take tests; some don't.
in the nature of the case, is unrealizable? Some teachers help children with difficult
How does one convey the notion that the words, or even read the tests along with the
test-reliability problem is not one of reducing children; others don't. Some schools de-
measurement error to absolute zero, but of vote their curriculum to teaching students
minimizing it as far as practicable and doing what is on the tests. Others place little
one's best to estimate whatever amount of emphasis on test taking and paper-and-
error remains, so that one may act cautiously pencil skills, thus giving students less expe-
and wisely in a world where all knowledge is rience in the rigor and tricks of test taking.
approximate and not even death and taxes All these sources of error—and I have
are any longer certain? (p. 87) scarcely scratched the surface of possibilities
—can seriously affect an individual score. ation clients more knowledgeable so they
Moreover, they have virtually nothing to will understand what Dyer's government
do with how good the test is, how carefully official did not: The challenge is not reduc-
it was prepared, or how valid its content is ing measurement error to absolute zero,
for a given child or group. Intrinsic to the but rather minimizing it as far as practicable
nature of testing, these errors are always and doing one's best to estimate whatever
present to some extent and are largely un- amount of error remains, so that one may
controllable. They are the reason that stat- act cautiously and wisely in a world where
isticians can never develop a test that is all knowledge is approximate and not even
100% reliable. death and taxes are any longer certain.
The errors are more or less serious de-
pending on how a test is used. When look-
ing at test scores for large groups, we can Trade-Offs
expect that, because of such errors, some
students will perform above their true level Different evaluation purposes affect
and other students will perform below their how much error can be tolerated. A sum-
true score. For most groups, statisticians mative evaluation to inform a major deci-
believe that these errors cancel each other. sion that will affect the future of a program,
The larger the group tested, the more likely perhaps touching the lives of thousands of
this is to be true. people and involving allocations of mil-
Different evaluation instruments are lions of dollars, will necessarily and appro-
subject to different kinds of errors. priately involve considerable attention to
Whether the evaluation includes data from and resources for minimizing error. In con-
tests, questionnaires, management infor- trast, a small-scale, fairly informal, forma-
mation systems, government statistics, or tive evaluation aimed at stimulating staff to
whatever—the analysis should include at- think about what they're doing will raise
tention to potential sources of error, and, fewer concerns about error. There is a lot
where possible, calculate and report the of territory between these extremes. How
degree of error. The point is that evaluators precise and robust findings need to be,
need not be defensive about errors. Rather, given available resources, are matters for
they need to explain the nature of errors, discussion and negotiation. The next two
help intended users decide what level of sections look at additional concerns that
precision is needed, consider the costs and commonly involve negotiation and trade-
benefits of undertaking procedures to re- offs: (1) breadth versus depth and (2) the
duce error (for instance, a larger sample relative generalizability of findings.
size), and help users to understand the im-
plications for interpreting findings. Pri-
mary intended users can be helpful in iden- Breadth Versus Depth
tifying potential sources of error. In my
experience, their overall confidence in Deciding how much data to gather in-
their ability to correctly and appropriately volves trade-offs between depth and
use evaluation data is increased when there breadth. Getting more data usually takes
has been a frank and full discussion oiboth longer and costs more, but getting less data
the data's strengths and weaknesses. In this usually reduces confidence in the findings.
way, evaluators are helping to make evalu- Studying a narrow question or very specific
problem in great depth may produce clear sine qua non" (p. 175). Internal validity in
results but leave other important issues and its narrowest sense refers to certainty about
problems unexamined. On the other hand, cause and effect. Did X cause Y? Did the
gathering information on a large variety of program cause the observed outcomes? In
issues and problems may leave the evalu- a broader sense, it refers to the "trust-
ation unfocused and result in knowing a worthiness of an inference" (Cronbach
little about a lot of things, but not knowing 1982:106). External validity, on the other
a lot about anything. hand, refers to the degree of confidence
During methods deliberations, some one has in generalizing findings beyond the
boundaries must be set on data collection. situation studied.
Should all parts of the program be studied Internal validity is increased by exercis-
or only certain parts? Should all partici- ing rigorous control over a limited set of
pants be studied or only some subset of carefully defined variables. However, such
clients? Should the evaluator aim at de- rigorous controls create artificialities that
scribing all program processes and out- limit generalizability. The highly controlled
comes or only certain priority areas? situation is less likely to be relevant to a
In my experience, determining priorities greater variety of more naturally occurring,
is challenging. Once a group of primary less controlled situations. In the narrowest
stakeholders gets turned on to learning sense, this is the problem of going from the
evaluative information, they want to know laboratory into the real world. By contrast,
everything. The evaluator's role is to help increasing variability and sampling a
them move from a rather extensive list of greater range of experiences or situations
potential questions to a much shorter list of typically reduces control and precision,
realistic questions and finally to a focused thereby reducing internal validity. The
list of essential and necessary questions. ideal is high internal validity and high ex-
This process moves from divergence to ternal validity. In reality, there are typically
convergence, from generating many possi- trade-offs involved in the relative emphasis
bilities (divergence) to focusing on a few placed on one or the other.
worthwhile priorities (convergence). Cronbach's (1982) discussion of these
This applies to framing overall evalu- issues for evaluation is quite comprehen-
ation questions as well as to narrowing sive and insightful. He emphasized that
items in a particular instrument, such as a "both external validity and internal validity
survey or interview. Many questions are are matters of degree and external validity
interesting, but which are crucial? These does not depend directly on internal valid-
end up being choices not between good and ity" (p. 170). Being able to apply findings
bad, but among alternatives, all of which to future decisions and new settings is often
have merit. more important than establishing rigorous
causal relations under rigid experimental
conditions. He introduced the idea of
Internal and External
extrapolation rather than generalization.
Validity in Design
Extrapolation involves logically and cre-
Trade-offs between internal and exter- tively thinking about what specific findings
nal validity have become a matter of debate mean for other situations, rather than the
in evaluation since Campbell and Stanley statistical process of generalizing from a
(1963) asserted that "internal validity is the sample to a larger population. He advo-
cated that findings be interpreted in light ten ridiculously short. A decision maker
of stakeholders' and evaluators' experi- may need whatever information can be
ences and knowledge, and then applied/ obtained in three months, even though the
extrapolated using all available insights, researcher insists that a year is necessary to
including understandings about quite dif- get data of reasonable quality and accuracy.
ferent situations. This focuses interpreta- This involves a trade-off between truth and
tion away from trying to determine truth in utility. Highly accurate data in a year are
some absolute sense (a goal of basic re- less useful to this decision maker than data
search) to a concern with conclusions that of less precision and validity obtained in
are reasonable, justifiable, plausible, war- three months.
ranted, and useful.
Decision makers regularly face the need
The contrasting perspectives of Camp- to take action with limited and imperfect
bell and Cronbach have elucidated the information. They prefer more accurate
trade-offs between designs that give first
information to less accurate information,
priority to certainty about causal inference
but they also prefer some information to no
(internal validity) versus those that better
information. This is why research quality
support extrapolations to new settings (ex-
and rigor are "much less important to utili-
ternal validity). These evaluation pioneers
zation than the literature might suggest"
formulated fundamentally different theo-
(Alkin et al. 1979:24).
ries of practice (Shadish et al. 1991). In
working with primary stakeholders to de- The effects of methodological quality on
sign evaluations that are credible, the use must be understood in the full context
evaluator will need to consider the degree of a study, its political environment, the
to which internal and external validity degree of uncertainty with which the deci-
are of concern, and to emphasize each in sion maker is faced, and thus his or her
accordance with stakeholder priorities. relative need for any and all clarifying in-
Choices are necessitated by the fact that no formation. If information is scarce, then
single design is likely to attain internal and new information, even of dubious quality,
external validity equally well. may be somewhat helpful.
The scope and importance of an evalu-
ation greatly affect the emphasis that will
Truth and Utility be placed on technical quality. Eleanor
Chelimsky (1987a, 1987b), former Presi-
Stakeholders want accurate informa- dent of the American Evaluation Associa-
tion; they apply truth tests (Weiss and tion and founding Director of the Program
Bucuvalas 1980) in deciding how seriously Evaluation and Methodology Division of
to pay attention to an evaluation. They also the U.S. General Accounting Office, has
want useful and relevant information. The insisted that technical quality is paramount
ideal, then, is both truth and utility. In the in policy evaluations to Congress. The
real world, however, there are often technical quality of national policy research
choices to be made between the extent to matters, not only in the short term, when
which one maximizes truth and the degree findings first come out, but over the long
to which data are relevant. term as policy battles unfold and evaluators
The simplest example of such a choice is are called on to explain and defend impor-
time. The time lines for evaluation are of- tant findings (Chelimsky 1995a).
O n the other hand, debates about tech- Another evaluator expressed similar
nical quality are likely to be much more sentiments about a study that h a d to be
center stage in national policy evaluations completed in only three m o n t h s .
than in local efforts to improve programs
at the street level, where the policy rubber There are a million things I'd do differently.
hits the day-to-day programming road. We needed more t i m e . . . . At the time, it was
O n e evaluator in our study of the use of probably the best study we could d o . . . . I'm
federal health studies linked the issue of satisfied in the sense that some people found
technical quality to the nature of uncer- it useful. It wasn't just kept on a shelf. People
tainty in organizational decision making. paid attention to that study and it had an
H e acknowledged inadequacies in the data impact. Now, I've done other studies that I
he had collected, but he had still worked thought were methodologically really much
with his primary users to apply the findings, more elegant that were kind of ignored, just
fully recognizing their problematic nature: sitting on somebody's shelf.
My opinion is that this really modest
study probably has had impact all out of
You have to make the leap from very limited
proportion to the quality of the research. It
data. I mean, that's what a decision's like.
happened to be at a certain place at a certain
You make it from a limited data base; and,
time, where it at least talked about some of
damn it, when you're trying to use quantita-
the things that people were interested in
tive data and it's inadequate, you supposedly
talking about, so it got some attention.
can't make a decision. Only you're not trou-
And many other studies that I know of that
bled by that. You can use impressionistic
have been done, that I would consider of
stuff. Yeah, your intuition is a lot better. I get
higher quality, haven't really gotten used.
a gestalt out of this thing on every program.
[EV145:34]
This may come as a great shock to you,
but that is what you use to make decisions.
Technical quality (truth tests) may get
In Chester Barnard's definition, for example,
less attention than researchers desire be-
the function of the executive is to make a
cause many stakeholders are n o t very so-
decision in the absence of adequate informa-
phisticated about m e t h o d s . Yet, they
tion. [EV148:11]
k n o w (almost intuitively) that the meth-
ods and measurements used in any study
H e went on to express some pride in the are open to question and attack, a point
cost-benefit ratio of his evaluation, despite emphasized earlier in this chapter. T h e y
admitted methods inadequacies: k n o w that researchers d o n ' t agree a m o n g
themselves about technical quality. As a
result, experienced decision makers apply
It was a pretty small investment on the part less rigorous standards than academics
of the government—$47,000 bucks. In the and, as long as they find the evaluation
evaluation business that's not a pile of effort credible a n d serious, they're m o r e
money. The questions I had to ask were interested in discussing the substance of
pretty narrow and the answers were equally findings than in debating m e t h o d s . Credi-
narrow and relatively decisive, and the find- bility involves m o r e than technical qual-
ings were put to use immediately and in the ity, though that is an i m p o r t a n t contribut-
long term. So, can you beat that? [EV148:8] ing factor. Credibility, and therefore
utility, are affected by "the steps we take your findings come out, who cares? So, I
to make and explain our evaluative deci- mean, you get a balance—the validity of the
sions, [and] also intellectually, in the ef- data against its relevance. And that's pretty
fort we put forth to look at all sides tough stuff. I mean, that's hard business.
and all stakeholders of an evaluation" [DM111:26]
(Chelimsky 1995a:219). The perception
of impartiality is at least as important as As no study is ever methodologically
methodological rigor in highly political perfect, it is important for primary stake-
environments. holders to know firsthand what imperfec-
Another factor that can reduce the tions exist—and to be included in decid-
weight decision makers give to technical ing which imperfections they are willing
quality is skepticism about the return on to live with in making the inevitable leaps
investment of large-scale, elaborately de- from limited data to incremental action.
signed, carefully controlled, and expensive
studies. Cohen and Weiss (1977) reviewed
20 years of policy research on race and The Dynamics of
schools, finding progressive improvement Measurement and
in research methods (i.e., increasingly rig- Design Decisions
orous designs and ever more sophisticated
analytical techniques). Sample sizes in- Research quality and relevance are not
creased, computer technology was intro- set in stone once an evaluation proposal has
duced, multiple regression and path ana- been accepted. A variety of factors emerge
lytic techniques were employed, and more throughout the life of an evaluation that
valid and reliable data-gathering instru- require new decisions about methods. Ac-
ments were developed. After reviewing the tively involving intended users in making
findings of studies produced with these methods decisions about these issues means
more rigorous methods, as well as the uses more than a one-point-in-time acquies-
made of their findings, they concluded that cence to a research design.
"these changes have led to more studies In every one of the 20 federal health
that disagree, to more qualified conclu- studies we investigated, significant meth-
sions, more arguments, and more arcane ods revisions and redesigns had to be done
reports and unintelligible results" (Cohen after data collection began. While little at-
and Weiss 1977:78). In light of this finding, tention has been devoted in the evaluation
simple, understandable, and focused evalu- literature to the phenomenon of slippage
ations have great appeal to practitioners between methods as originally proposed
and action-oriented evaluation users. and methods as actually implemented, the
In utilization-focused evaluation, atten- problem is similar to that of program im-
tion to technical quality is tied to and bal- plementation, where original specifications
anced by concern for relevance and timeli- typically differ greatly from what finally
ness. As one decision maker in our federal gets delivered (see Chapter 9).
health evaluations study put it: McTavish et al. (1975) studied imple-
mentation of 126 research projects funded
You can get so busy protecting yourself across seven federal agencies. All 126 proj-
against criticism that you develop such an ects were rated by independent judges
elaborate methodology that by the time along seven descriptive methodological
scales. Both original proposals and final volves checking out changes with in-
reports were rated; the results showed sub- tended users as they are m a d e . While it is
stantial instability between the two. The impractical to have evaluator-stakeholder
researchers concluded, discussions about every minor change in
methods, utilization-focused evaluators
Our primary conclusion from the Predict- prefer to err in the direction of consul-
ability Study is that the quality of final report tative rather than unilateral decision mak-
methodology is essentially not predictable ing, when there is a choice. Stakeholders
from proposal or interim report documenta- also carry a responsibility to make sure
tion. This appears to be due to a number of they remain committed to the evaluation.
factors. First, research is characterized by O n e internal evaluator interviewed in our
significant change as it develops over time. federal utilization study, still smarting
Second, unanticipated events force shifts in from critiques of his evaluation as m e t h o -
direction. Third, the character and quality of dologically weak, offered the following
information available early in a piece of re- advice to decision makers w h o commis-
search makes assessment of some features sion evaluations:
of methodology difficult or impossible,
(pp. 62-63)
Very, very often those of us who are doing
Earlier in the report, they had pointed out evaluation studies are criticized for poor
that methodology, and the people who levy the
criticism sometimes are the people who pay
among the more salient reasons for the low for the study. Of course, they do this more
predictability from early to late documenta- often when the study is either late or it
tion is the basic change which occurs during doesn't come up with the answers that they
the course of most research. It is, after all, a were looking for. But I think that a large
risky pursuit rather than a pre-programmed share of the blame or responsibility belongs
product. Initial plans usually have to be al- to the project monitor, sponsor, or funder for
tered once the realities of data or oppor- not maintaining enough control, direct
tunities and limitations become known. hands-on contact with the evaluation as it's
Typically, detailed plans for analysis and re- ongoing.
porting are postponed and revised. External I don't think that it's fair to blame an
events also seem to have taken an expected [evaluation] contractor for developing a
toll in the studies we examined.... Both the poor study approach, a poor methodology,
context of research and the phenomena be- and absolve yourself, if you're the sponsor
ing researched are typically subject to great because it's your role as a project monitor to
change, (p. 56) be aware of what those people that you're
paying are doing all the time, and to guide
If intended users are involved only at them.
the stage of approving research proposals, We let contracts out and we keep our
they are likely to be surprised w h e n they hands on these contractors all the time. And
see a final report. Even interim reports when we see them going down a road that
bear only m o d e r a t e resemblance to final we don't think is right, we pull them back
reports. T h u s , making decisions about and we say, "Hey, you know, we disagree."
methods is a continuous process that in- We don't let them go down the road all the
way and then say, "Hey fella, you went down However, it is impossible to anticipate
the wrong road." [EV32:15] all potential threats to data quality. Even
when faced with the reality of particular
I have found this a useful quote to circumstances and specific evaluation
share with primiry stakeholders who problems, it is impossible to know in ad-
have expressed reluctance to stay involved vance precisely how a creative design or
with the evaluation as it unfolds. Caveat measurement approach will affect results.
emptor. For example, having program staff do cli-
ent interviews in an outcomes evaluation
could (1) seriously reduce the validity and
Threats to Data Quality reliability of the data, (2) substantially in-
crease the validity and reliability of the
Evaluators have an obligation to think data, or (3) have no measurable effect on
about, anticipate, and provide guidance data quality. The nature and degree of ef-
about how threats to data quality will affect fect would depend on staff relationships
interpreting and using results. Threats to with clients, how staff were assigned to
internal validity, for example, affect any clients for interviewing, the kinds of ques-
conclusion that a program produced an tions being asked, the training of the staff
observed outcome. The observed effect interviewers, attitudes of clients toward the
could be due to larger societal changes program, and so on. Program staff might
(history), as when generally increased so- make better or worse interviewers than
cietal awareness of the need for exercise external evaluation researchers, depending
and proper nutrition contaminates the ef- on these and other factors.
fects of specific programs aimed at encour- An evaluator must grapple with these
aging exercise and proper nutrition. Matu- kinds of data quality questions for all de-
ration is a threat to validity when it is signs. No automatic rules apply. There is no
difficult to separate the effects of a program substitute for thoughtful analysis based on
from the effects of growing older; this is a the specific circumstances and information
common problem in juvenile delinquency needs of a particular evaluation, both ini-
programs, as delinquency has been shown tially and as the evaluation unfolds.
to decline naturally with age. Reactions to
gathering data can affect outcomes inde-
pendent of program effects, as when stu-
Threats to Utility
dents perform better on a posttest simply
because they are more familiar with the test
Whereas traditional evaluation methods
the second time; or there can be interac-
texts focus primarily on threats to validity,
tions between the pretest and the program
this chapter has focused primarily on
when the experience of having taken a
threats to utility. Threats to utility include
pretest increases participants' sensitivity to
the following:
key aspects of a program. Losing people
from a program (experimental mortality)
can affect findings since those who drop • failure to focus on intended use by intended
out, and therefore fail to take a posttest, are users
likely to be different in important ways • inadequate involvement of primary intended
from those who stay to the end. users in making methods decisions
• focusing on unimportant issues—low rele- using. My consulting brings me into con-

vance tact with hundreds of evaluation colleagues
• inappropriate methods and measures given and users. I know from direct observation
stakeholder questions and information that many evaluators are meeting these
needs challenges with great skill, dedication,
• poor stakeholder understanding of the competence, and effectiveness. Much im-
evaluation generally and findings specifically portant and creative work is being done by
• low user belief and trust in the evaluation evaluators in all kinds of difficult and de-
process and findings manding situations as they fulfill their com-
• low face validity mitment to do the most and best they can
• failure to design the evaluation to fit the with the resources available, the short
context and situation deadlines they face, and the intense politi-
• unbalanced data collection and reporting cal pressures they feel. They share a belief
• perceptions that the evaluation is unfair or that doing something is better than doing
that the evaluator is biased or less than im- nothing, so long as one is realistic and
partial honest in assessing and presenting the limi-
• low evaluator credibility tations of what is done.
• political naivete
This last caveat is important. I have not
• failure to keep stakeholders adequately in-
attempted to delineate all possible threats
formed and involved along the way as design
to validity, reliability, and utility. This is not
alterations are necessary
a design and measurement text. My pur-
pose has been to stimulate thinking about
We now have substantial evidence that
how attention to intended use for intended
paying attention to and working to
users affects all aspects of evaluation prac-
counter these threats to utility will lead to
tice, including methods decisions.
evaluations that are worth using—and are
actually used. Pragmatism undergirds the utilitarian
emphasis of utilization-focused evaluation.
In designing evaluations, it is worth keep-
Designing Evaluations Worth Using: ing in mind World War II General George
Reflections on the State of the Art S. Patton's Law: A good plan today is better
than a perfect plan tomorrow.
This chapter has described the chal- Then there is Halcolm's evaluation
lenges evaluators face in working with in- corollary to Patton's law: Perfect designs
tended users to design evaluations worth aren't.
12
The Paradigms Debate
and a Utilitarian Synthesis
L ady, I do not make up things. That is lies. Lies is not true. But the truth could
be made up if you know how. And that's the truth.
—Lily Tomlin as character "Edith Ann,"
Rolling Stone, October 24, 1974
Training
A former student sent me the following story, which she had received as an e-mail chain
letter, a matter of interest only because it suggests widespread distribution.
Once upon a time, not so very long ago, a group of statisticians (hereafter known as
quants) and a party of qualitative methodologists (quals) found themselves together on
j train traveling to the same professional meeting. The quals, all of whom had tickets,
observed that the quants had only one ticket for their whole group.
"How can you all travel on one ticket?" asked a qual.
"We have our methods," replied a quant.
Later, when the conductor came to punch tickets, all the quants slipped quickly
behind the door of the toilet. When the conductor knocked on the door, the head quant
slipped their one ticket under the door, thoroughly fooling the conductor.
On their return from the conference, the two groups again found themselves on the
same train. The qualitative researchers, having learned from the quants, had schemed
265
to share a single ticket. They were chagrined, therefore, to learn that, this time, the
statisticians had boarded with no tickets.
"We know how you traveled together with one ticket," revealed a qual, "hut how
can you possibly get away with no tickets?"
"We have new methods," replied a quant.
Later, when the conductor approached, all the quals crowded into the toilet. The
head statistician followed them and knocked authoritatively on the toilet door. The
quals slipped their one and only ticket under the door. The head quant took the ticket
and joined the other quants in a different toilet. The quals were subsequently discovered
without tickets, publicly humiliated, and tossed off the train at its next stop.
Methodological Respectability
This story offers a remnant of what was ods and ridiculing qualitative approaches.
once a great paradigms debate about the He lamented what he perceived as a decline
relative merits of quantitative/experimen- in the training of evaluators, especially in
tal methods versus qualitative/naturalistic conducting rigorous quantitative studies.
methods. That debate has run out of intel- He linked this to a more general "decline
lectual steam and is now relegated to com- of numeracy" and increase in "mathemati-
edy on the Internet. As Thomas Cook, one cal illiteracy" in the nation. "My opinion,"
of evaluation's luminaries—the Cook of he stated, "is that qualitative evaluation is
Cook and Campbell (1979), the bible of proving so attractive because it is, superfi-
quasi-experimentation, and of Shadish, cially, so easy" (Sechrest 1992:4). Partly
Cook, and Leviton (1991), the definitive tongue in cheek, he cited as evidence of
work on evaluation theorists—pronounced qualitative evaluators' mathematical inep-
in his keynote address to the 1995 Interna- titude a proposal he had reviewed from a
tional Evaluation Conference in Vancou- qualitative researcher that contained a mis-
ver, "Qualitative researchers have won the placed decimal point and, as another piece
qualitative-quantitative debate." of evidence, an invitation to a meeting of
"qualitative research types" that asked for
Won in what sense? a February 30 reply (p. 5). He concluded,
Won acceptance. "If we want to have the maximum likeli-
The validity of experimental methods hood of our results being accepted and
and quantitative measurement, appropri- used, we will do well to ground them, not
ately used, was never in doubt. Now, quali- in theory and hermeneutics, but in the de-
tative methods have ascended to a level of pendable rigor afforded by our best sci-
parallel respectability. That ascendance was ence and accompanying quantitative analy-
not without struggle and sometimes acri- ses" (p. 3).
monious debate, as when Lee Sechrest, Beyond the rancor, however, Sechrest
American Evaluation Association president joined other eminent researchers in ac-
in 1991, devoted his presidential address to knowledging a role for qualitative meth-
alternatively defending quantitative methods, especially in combination with quanti-
The Paradigms Debate • 267
tative approaches. He was preceded in this remain mired in the simplistic worldview
regard by distinguished methodological that statistical results (hard data) are more
scholars such as Donald Campbell and scientific and valid than qualitative case
Lee J. Cronbach. Ernest House (1977), studies (soft data). Therefore, to involve
describing the role of qualitative argument intended users in methods decisions, utili-
in evaluation, observed that "when two zation-focused evaluators need to under-
of the leading scholars of measurement stand the paradigms debate and be able to
and experimental design, Cronbach and facilitate choices that are appropriate to the
Campbell, strongly support qualitative evaluation's purpose. This will often re-
studies, that is strong endorsement indeed" quire educating primary stakeholders
(p. 18). In my own work, I have found about the legitimate options available, the
increased interest in and acceptance of potential advantages of multiple methods,
qualitative methods in particular and mul- and the strengths and weaknesses of vari-
tiple methods in general. ous approaches. Toward that end, this
A consensus has emerged in the profes- chapter reviews the paradigms debate and
sion that evaluators need to know and use then offers a utilization-focused synthesis.
a variety of methods in order to be respon-
sive to the nuances of particular evaluation
questions and the idiosyncrasies of specific The Paradigms Debate
stakeholder needs. As noted in the previous
chapter, the issue is the appropriateness of A paradigm is a worldview built on im-
methods for a specific evaluation purpose plicit assumptions, accepted definitions,
and question, not adherence to some abso- comfortable habits, values defended as
lute orthodoxy that one or the other ap- truths, and beliefs projected as reality. As
proach is inherently preferred. The field such, paradigms are deeply embedded in
has come to recognize that, where possible, the socialization of adherents and practi-
using multiple methods—both quantitative tioners: Paradigms tell them what is im-
and qualitative—can be valuable, since portant, legitimate, and reasonable. Para-
each has strengths and one approach can digms are also normative, telling the
often overcome weaknesses of the other. practitioner what to do without the nec-
The problem is that this ideal of evalua- essity of long existential or epistemologi-
tors being situationally responsive, metho- cal consideration. But it is this aspect of
dologically flexible, and sophisticated in paradigms that constitutes both their
using a variety of methods runs headlong strength and their weakness—their strength
into the realities of the evaluation world. in that it makes action possible, their weak-
Those realities include limited resources, ness in that the very reason for action is
political considerations of expediency, and hidden in the unquestioned assumptions of
the narrowness of disciplinary training the paradigm.
available to most evaluators—training that
imbues them with varying degrees of meth- Scientists work from models acquired
odological prejudice. Moreover, while I be- through education and through subsequent
lieve that the paradigms debate has lost its exposure to the literature, often without
acerbic edge among most evaluators, many quite knowing or needing to know what
users of evaluation—practitioners, policy- characteristics have given these models the
makers, program managers, and funders— status of community paradigms. . . . That
scientists do not usually ask or debate what phenomenology. Using the techniques of
makes a particular problem or solution legiti- in-depth, o p e n - e n d e d interviewing and
mate tempts us to suppose that, at least personal observation, the alternative
intuitively, they know the answer. But it may paradigm relies on qualitative data, natu-
only indicate that neither the question nor ralistic inquiry, and detailed description
the answer is felt to be relevant to their derived from close contact with people in
research. Paradigms may be prior to, more the setting under study.
binding, and more complete than any set of In utilization-focused evaluation, nei-
rules for research that could be unequivo- ther of these paradigms is intrinsically bet-
cally abstracted from them. (Kuhn 1970:46) ter than the other. They represent alterna-
tives from which the utilization-focused
Evaluation was initially dominated by the evaluator can choose; both contain options
natural science paradigm of hypothetico- for primary stakeholders and information
deductive m e t h o d o l o g y , which values users. Issues of methodology are issues of
quantitative measures, experimental de- strategy, not of morals. Yet, it is not easy to
sign, and statistical analysis as the epitome approach the selection of evaluation meth-
of " g o o d " science. Influenced by philo- ods in this adaptive fashion. The paradig-
sophical tenets of logical positivism, this matic biases in each approach are quite
m o d e l for evaluation came from the tradi- fundamental. Great passions have been
tion of experimentation in agriculture, the aroused by advocates on each side. Kuhn
archetype of applied research. (1970) has pointed out that this is the
nature of paradigm debates:
The most common form of agricultural-
botany type evaluation is presented as an To the extent that two scientific schools dis-
assessment of the effectiveness of an innova- agree about what is a problem and what is a
tion by examining whether or not it has solution, they will inevitably talk through
reached required standards on prespecified each other when debating the relative merits
criteria. Students—rather like plant crops— of their respective paradigms. In the partially
are given pretests (the seedlings are weighed circular arguments that regularly result, each
or measured) and then submitted to different paradigm will be shown to satisfy more or
experiments (treatment conditions). Subse- less the criteria that it dictates for itself and
quently, after a period of time, their attain- to fall short of a few of those dictated by its
ment (growth or yield) is measured to indi- opponent.... Since no paradigm ever solves
cate the relative efficiency of the methods all problems it defines, and since no two
(fertilizer) used. Studies of this kind are de- paradigms leave all the same problems unan-
signed to yield data of one particular type, swered, paradigm questions always involve
i.e., "objective" numerical data that permit the question: Which problem is it more sig-
statistical analyses. (Parlett and Hamilton nificant to have solved? (pp. 109-10)
1976:142)
T h e countering positions that sparked
By way of contrast, the alternative to the debate in evaluation remain relevant
the d o m i n a n t quantitative/experimental because much social science training is
paradigm was derived from the tradition still quite narrow. Evaluators and those
of anthropological field studies and un- w h o commission or use evaluation will
dergirded by the philosophical tenets of naturally be most comfortable with those
methods in which they have been trained did not concern themselves with whether
and to which they have most often been the evaluation findings were important or
exposed. A particular way of viewing the used, or even whether the methods and
world, based on disciplinary training and measures were appropriate to the problem
specialization, becomes so second-nature under study. They judged the quality of
that it takes on the characteristics of a evaluation research entirely by its confor-
paradigm. The paradigms debate has been mance with the dominant, hypothetico-
a prominent and persistent topic in evalu- deductive paradigm.
ation and has generated a substantial lit- Documenting the consensus that existed
erature, only a sample of which is refer- for how they defined evaluation quality,
enced here (Donmoyer 1996; Moss 1996; Bernstein and Freeman cited major texts of
Cook 1995; Phillips 1995; Denzin and the time, for example, Suchman (1967),
Lincoln 1994; Guba and Lincoln 1994, Caro (1971), and Rossi and Williams
1989, 1981; Fishman 1992; Eisner 1991; (1972). Representative of the dominant
House 1991; Rizo 1991; Cochran-Smith perspective at the time is that of Wholey
and Lytle 1990; Patton 1990, 1978, et al. (1970): "Federal money generally
1975a; Howe 1988; J. K. Smith 1988; should not be spent on evaluation of in-
Lincoln and Guba 1985; Cronbach 1982, dividual local projects unless they have
1975; Heilman 1980; Reichardt and been developed as field experiments, with
Cook 1979; Rist 1977). Paradigm discus- equivalent treatment and control groups"
sions and debates have also been a regular (p. 93). The Social Science Research Coun-
feature at meetings of professional evalua- cil (Reicken and Boruch 1974) took a simi-
tors worldwide. lar position, as did eminent evaluation pio-
neer Peter Rossi (1972) in reporting general
consensus about the most desired evalu-
The Quantitative/Experimental ation research methods at a conference on
Paradigm in Its Days of Domination evaluation and policy research sponsored
by the American Academy of Arts and Sci-
Evidence of the early dominance of ences in 1969.
the quantitative/experimental (hypothetico- A cursory skimming of major educa-
deductive) paradigm as the method of tional and social science research jour-
choice in evaluation research can be found nals would confirm the dominance of
in the meta-evaluation work of Bernstein the hypothetico-deductive paradigm. In
and Freeman (1975). The purpose of their their widely used methodological primer,
study was to assess the quality of evaluative Campbell and Stanley (1963) called this
research at the time. What is of interest to paradigm "the only available route to cu-
us here is the way Bernstein and Freeman mulative progress" (p. 3). It was this belief
defined quality. Exhibit 12.1 shows how in and commitment to the natural science
they coded their major indicators of qual- model on the part of the most prominent
ity; a higher number represents higher- academic researchers that made experi-
quality research. The highest quality rating mental designs and statistical measures
was reserved for completely quantitative dominant. As Kuhn (1970) has explained,
data obtained through an experimental de- "A paradigm governs, in the first instance,
sign and analyzed with sophisticated statis- not a subject matter but rather a group of
tical techniques. Bernstein and Freeman practitioners" (p. 80). Those practitioners
EXHIBIT 12.1
Dominant Paradigm: Operational Definition
of Evaluation Quality in the 1970s
Dimension of Coding Scheme

Evaluation Quality (higher number = higher quality)
Sampling 1 = Systematic random

0 = Nonrandom, cluster, or nonsystematic
Data analysis 2 = Quantitative

1 = Qualitative and quantitative
0 = Qualitative
Statistical procedures 4 = Multivariate

3 = Descriptive
2 = Ratings from qualitative data
1 = Narrative data only
0 = No systematic material
Impact Procedures Design 3 = Experimental or quasi-experimental randomization and control

groups
2 = Experimental or quasi-experimental without both randomization
and control groups
1 = Longitudinal or cross-sectional without control or comparison
groups
0 = Descriptive, narrative
SOURCE: Bernstein and Freeman 1975.
most committed to the dominant para- constraints were offered, usually defen-
digm were found in universities, where sively, under the assumption that since we
they employed the scientific method in were from a university, we would be critical
their own evaluation research and nurtured of such departures. Studies were described
students in a commitment to that same as hard or soft along a continuum in which
methodology. harder was clearly better and didn't even
In our mid-1970s study of how federal need explicit definition.
health evaluations were used, every respon- The problem from a utilization-focused
dent answered methodological questions perspective was that the very dominance of
with reference to the dominant paradigm. the quantitative/experimental paradigm
If a particular evaluation being reviewed had cut off the great majority of evaluators
had departed from what were implicitly and stakeholders from serious considera-
understood to be the ideals of "good sci- tion of any alternative paradigm or meth-
ence," long explanations about practical ods. The label research had come to mean
the equivalent of employing the scientific actions are intelligible in ways that the behav-
method: testing hypotheses, formulated ior of nonhuman objects is not. (Strike
deductively, through random assignment of 1972:28)
program participants to treatment and con-
trol controls, and measuring outcomes In essence, the Verstehen doctrine as-
quantitatively. Nothing else was really wor- serted that applied social sciences need
thy of serious attention by definition. methods different from those used in ag-
An alternative existed, however, an- riculture because human beings are differ-
other way of studying program processes ent from plants. The alternative paradigm
and outcomes that began to attract a fol- emphasized attention to the meaning of
lowing from evaluators and practitioners human behavior, the context of social in-
who found that the dominant paradigm teraction, and the connections between
failed to answer—or even ask—their subjective states and behavior. The tradi-
questions. tion of Verstehen places emphasis on the
The importance of having an alternative human capacity to know and understand
is captured powerfully by the distinguished others through empathic introspection
adult educator Malcolm Knowles (1989), and reflection based on detailed descrip-
who, in his autobiography, The Making of tion gathered through direct observation,
an Adult Educator, listed discovery of an in-depth, open-ended interviewing, and
alternative way of evaluating adult learn- case studies. Evaluation came to have ad-
ing as one of the eight most important vocates for and users of alternative meth-
episodes of his life, right there alongside his ods. Robert Stake's (1975) responsive ap-
marriage. proach was one such early alternative.
Responsive evaluation is an alternative, an

The Emergence of the Alternative old alternative, based on what people do
Qualitative/Naturalistic Paradigm naturally to evaluate things; they observe
and react. The approach is not new. But this
The alternative methods paradigm was alternative has been avoided in district, state,
derived most directly from anthropological and federal planning documents and regula-
field methods and more generally from tions because it is subjective and poorly
qualitative sociology and phenomenology. suited to formal contracts. It is also capable
It was undergirded by the doctrine of of raising embarrassing questions, (p. 14)
Verstehen (understanding):
Stake recommended responsive evaluation
Advocates of some version of the verstehen because "it is an approach that trades off
doctrine will claim that human beings can be some measurement precision in order to
understood in a manner that other objects of increase the usefulness of the findings to
study cannot. Humans have purposes and persons in and around the program"
emotions, they make plans, construct cul- (p. 14). Stake influenced a new generation
tures, and hold certain values, and their of evaluators to think about the connection
behavior is influenced by such values, plans, between methods and use, and his book on
and purposes. In short, a human being lives The Art of Case Research (1995), published
in a world which has "meaning" to him, and, two decades later, promises to extend that
because his behavior has meaning, human influence.
Another window into alternative meth- a monograph comparing alternative para-

ods came from what Parlett and Hamilton digms (Patton 1975a), reactions to which
(1976) called illuminative evaluation, an embroiled me directly and personally in
approach they developed for schools. the passions and flames of the great para-
digms debate. At the time it was exhilarat-
Illuminative evaluation takes account of the ing. Looking back from today's vantage
wider contexts in which educational pro- point of methodological eclecticism, the
grams function. Its primary concern is with barbs traded by opposing camps would
description and interpretation rather than appear silly but for the fact that, in circles
measurement and prediction. It stands not yet touched by the light that eventu-
unambiguously within the alternative an- ally emerged from the debate, friction and
thropological paradigm. The aims of illu- its attendant heat still burn evaluators
minative evaluation are to study the innova- who encounter true believers in the old
tory program: how it operates; how it is orthodoxies. It is to prepare for such en-
influenced by the various school situations in counters, and be able to rise gently above
which it is applied; what those directly the acrimony they can inspire, that stu-
concerned regard as its advantages and dis- dents of evaluation need to understand
advantages; and how students' intellectual the dimensions and passions of the debate.
tasks and academic experiences are most af-
fected. It aims to discover and document
what it is like to be participating in the
scheme, whether as teacher or pupil, and, in Dimensions of the
addition, to discern and discuss the innova- Competing Paradigms
tion's most significant features, recurring
concomitants, and critical processes. In short, By the end of the 1970s, then, the
it seeks to address and illuminate a complex evaluation profession had before it the
array of questions, (p. 144) broad outlines of two competing research
paradigms.
I joined the fray at about the same time Exhibit 12.2 displays the contrasting
when, after being thoroughly indoctri- emphases of the two methodological para-
nated into the dominant paradigm as a digms. Beyond differences in basic philo-
quantitative sociologist, I became in- sophical assumptions about the nature of
volved in evaluating an open education reality (e.g., singular reality versus multiple
program whose practitioners objected to realities), in its details the paradigms debate
the narrow and standardized outcomes included the relative merits of a number of
measured by standardized tests. Because dimensions, like the relative merits of being
they advocated an educational approach close to versus distant from program par-
that they considered individualized, per- ticipants during an evaluation. While re-
sonal, humanistic, and nurturing, they viewing these dimensions will illuminate
wanted evaluation methods with those the nature of the paradigms debate, they
same characteristics. In attempting to be also can be thought of as options that might
responsive to my intended users (open be offered to intended users during meth-
educators) and do an evaluation that was ods deliberations and negotiations. We'll
credible and useful to them, I discovered begin with the division about the relative
qualitative methods. That led me to write merits of numbers versus narrative.
EXHIBIT 12.2
Dimensions of Competing Methodological Paradigms
Qualitative/Naturalistic Paradigm Quantitative/Experimental Paradigm
Qualitative data (narratives, descriptions) Quantitative data (numbers, statistics)

Naturalistic inquiry Experimental designs
Case studies Treatment and control groups
Inductive analysis Deductive hypothesis testing
Subjective perspective Objective perspective
Close to the program Aloof from the program
Holistic contextual portrayal Independent and dependent variables
Systems perspective focused on interdependencies Linear, sequential modeling
Dynamic, ongoing view of change Pre-post focus on change
Purposeful sampling of relevant cases Probabilistic, random sampling
Focus on uniqueness and diversity Standardized, uniform procedures
Emergent, flexible designs Fixed, controlled designs
Thematic content analysis Statistical analysis
Extrapolations Generalizations
Quantitative and Qualitative Data: and in their day-to-day program settings,

Different Perspectives on the World through observation. Qualitative data con-
sist of detailed descriptions of situations,
Quantitative measures strive for preci- events, people, interactions, and observed
sion by focusing on things that can be behaviors; direct quotations from people
counted and, when gathering data from about their experiences, attitudes, beliefs,
human beings, conceptualizing predeter- and thoughts; and excerpts or entire pas-
mined categories that can be treated as sages from documents, correspondence, re-
ordinal or interval data and subjected to cords, and case histories. The data are col-
statistical analysis. The experiences of peo- lected as open-ended narrative without
ple in programs and the important vari- predetermined, standardized categories
ables that describe program outcomes are such as the response choices that make up
fit into these standardized categories to typical questionnaires or tests.
which numerical values are attached. Numbers are parsimonious and precise;
Quantitative data come from question- words provide detail and nuance. Each way
naires, tests, standardized observation in- of turning the complexities of the world
struments, and program records. into data has strengths and weaknesses.
In contrast, the evaluator using a quali- Qualitative data offer detailed, rich de-
tative approach seeks to capture what a scription, capturing variations between
program experience means to participants cases; quantitative data facilitate compari-
in their own words, through interviews, sons because all program participants re-
spond to the same questions on standard- A major methodological consequence of

ized scales within predetermined response these commitments is that the qualitative
categories. Standardized tests and surveys study of people in situ is a process of discov-
make it possible to measure the reactions ery. It is of necessity a process of learning
of many respondents to a limited set of what is happening. Since a major part of
questions; statistical aggregation and analy- what is happening is provided by people in
sis are relatively straightforward, following their own terms, one must find out about
established rules and procedures. By con- those terms rather than impose upon them a
trast, qualitative methods typically produce preconceived or outsider's scheme of what
a wealth of detailed data about a much they are about. It is the observer's task to find
smaller number of people and cases; analy- out what is fundamental or central to the
sis can be painstaking, time-consuming, people or world under observation, (p. 4)
and uncertain.
Sociologist J o h n Lofland (1971) sug- So what is there to debate about quan-
gested that there are four elements in col- titative versus qualitative w h e n each can
lecting qualitative data. First, the qualita- contribute in i m p o r t a n t ways to our un-
tive evaluator must get close enough to the derstanding of programs? T h e debate
people and situation being studied to be stems from underlying c o n n o t a t i o n s and
able to understand the depth and details of deeply held values. "If you can't measure
what goes on. Second, the qualitative it, if you can't quantify it, it doesn't exist,"
evaluator must aim at capturing what actu- is a refrain many p r o g r a m staff have heard
ally takes place and what people actually from evaluators seeking "clear, specific,
say: the perceived facts. Third, qualitative and measurable goals" (see Chapter 7 on
data consist of a great deal of pure descrip- the goals clarification game). " W h a t gets
tion of people, activities, and interactions. measured gets d o n e , " the m a n t r a of m a n -
Fourth, qualitative data consist of direct agement by objectives and performance
quotations from people, both what they contracting, communicates that only w h a t
speak and what they write down. can be quantified is i m p o r t a n t . Statistical
presentations tend to have more credibil-
ity, to seem more like "science," whereas
The commitment to get close, to be factual, qualitative narratives tend to be associ-
descriptive, and quotive, constitutes a signifi- ated with " m e r e " journalism. A certain
cant commitment to represent the par- assertiveness, even machismo, often ac-
ticipants in their own terms. This does not companies the d e m a n d that outcomes be
mean that one becomes an apologist for quantified: hard data connote virility; soft
them, but rather.that one faithfully depicts data are flaccid. (Sexual i n n u e n d o w o r k s
what goes on in their lives and what life is in science n o less than in advertising, or
like for them, in such a way that one's audi- so it w o u l d seem.)
ence is at least partially able to project Kuhn (1970), a philosopher and histo-
themselves into the point of view of the rian of science, observed that the values
people depicted. They can "take the role of scientists hold "most deeply" concern pre-
the other" because the reporter has given dictions: "quantitative predictions are pref-
them a living sense of day-to-day talk, day- erable to qualitative ones" (pp. 184-85).
to-day activities, day-to-day concerns and It's a short distance from a preference for
problems. . . . quantitative data to the virtual exclusion of
other types of data. Bernstein and Freeman was evaluating community mental health
(1975) even ranked evaluations that gath- programs and reported that statistical
ered both quantitative and qualitative data measures frequently failed to capture real
as lower in methodological quality than differences among programs. For exam-
those that gathered only quantitative data. ple, he found a case in which community
This is an example of what sociologist mental health staff cooperated closely
C. Wright Mills (1961) called "abstracted with the state hospital. On one occasion,
empiricism: . . . a methodological inhibi- he observed a therapist from the commu-
tion [that] seizes upon one juncture in the nity mental health center accompany a
process of work and allows it to dominate seriously disturbed client on the "trau-
the mind" (p. 50). matic, fearful, anxiety-ridden trip to the
Valuing quantitative measures to the ex- state hospital." The therapist had been
clusion of other data limits not only what working with the client on an outpatient
one can find out but also what one is even basis. After commitment to the state facil-
willing to ask. It is easy to count the words ity, the therapist continued to see the cli-
a child spells correctly, but what about that ent weekly and assisted that person in
same child's ability to use those words in a planning toward and getting out of the
meaningful way? It is easy to count the state institution and back into the larger
minutes a student spends reading in class, community as soon as possible. The evalu-
but what does reading mean to that stu- ator found it very difficult to measure this
dent? Different kinds of problems require aspect of the program quantitatively.
different types of data. If we only want to
know the frequency of interactions be- This actually becomes a qualitative aspect of
tween children of different races in deseg- how they were carrying out the mental
regated schools, then statistics are appro- health program, but there's a problem of
priate. However, if we want to understand measuring the impact of that qualitative
the meanings of interracial interactions, change from when the sheriff used to trans-
open-ended, in-depth interviewing will be port the patients from that county in a locked
more appropriate. car with a stranger in charge and the para-
phernalia of the sheriff's personality and
If the problems upon which one is at work office. The qualitative difference is obvious
are readily amenable to statistical proce- in the possible effect on a disturbed patient,
dures, one should always try them first. . . . but the problem of measurement is very, very
No one, however, need accept such pro- difficult. So what we get here in the report is
cedures, when generalized, as the only a portrayal of some of the qualitative differ-
procedures available. Certainly no one need ences and a very limited capacity of the field
accept this model as a total canon. It is not at that time to measure those qualitative
the only empirical manner. differences. We could describe some of them
It is a choice made according to the re- better than we could measure them. [EV5:3]
quirements of our problems, not a "neces-
sity" that follows from an epistemological A more extended example will help
dogma. (Mills 1961:73-74) illustrate the importance of seeking con-
gruence between the phenomenon stud-
One evaluator in our federal utilization ied and the data gathered for an evalu-
study told of struggling with this issue. He ation. Edna Shapiro (1973) found no
achievement test differences between (1) observed and inferred only from responses
children in an enriched Follow T h r o u g h in test situations, and that the observation of
p r o g r a m m o d e l e d along the lines of open teaching and learning in the classroom
education and (2) children in comparison should be considered auxiliary information,
schools n o t involved in Follow T h r o u g h useful chiefly to document the differences in
or other enrichment programs. When the the children's group learning experiences.
children's responses in the test situation The rationale of the test, on the contrary,
were compared, no differences of any con- is that each child is removed from the class-
sequence were found. However, when ob- room and treated equivalently, and differ-
servations of the children in their class- ences in response are presumed to indicate
rooms were made, there were striking differences in what has been taken in, made
differences between the Follow Through one's own, that survives the shift to a differ-
and comparison classes. First, the envi- ent situation.
r o n m e n t s were observedly different (im- The findings of this study, with the
plementation evaluation). marked disparity between classroom re-
sponses and test responses, have led me to
The Follow Through (FT) classrooms were reevaluate this rationale. This requires re-
characterized as lively, vibrant, with a diver- consideration of the role of classroom data,
sity of curricular projects and children's individual test situation data, and the rela-
products, and an atmosphere of friendly, tion between them. If we minimize the im-
cooperative endeavor. The non-FT class- portance of the child's behavior in the class-
rooms were characterized as relatively room because it is influenced by situational
uneventful, with a narrow range of curricu- variables, do we not have to apply the same
lum, uniform activity, a great deal of seat logic to the child's responses in the test situ-
work, and less equipment; teachers as well as ation, which is also influenced by situational
children were quieter and more concerned variables? (Shapiro 1973:532-34; emphasis
with maintaining or submitting to discipline. added)
(Shapiro 1973:529)
Shapiro (1973) elaborated and illus-
Observations also revealed that the chil- trated these points at length. H e r conclu-
dren performed differently in the t w o en- sion went to the heart of the problem
vironments on i m p o r t a n t dimensions that posed by the previous dominance of a
standardized achievement tests failed to single methodological paradigm in evalu-
detect. Shapiro found factors operating ation research: "Research methodology
against the demonstration of differences, must be suited to the particular charac-
factors that called into question, for her, teristics of the situations under s t u d y . . . .
traditional ways of gauging the impact and An omnibus strategy will not work"
effectiveness of different kinds of school (p. 543).
experiences. The testing methodology, in At first, some evaluators were willing to
fact, narrowed the nature of the questions recognize that qualitative data might be
that were being asked and predetermined useful at an exploratory stage to design
nonsignificant statistical results. quantitative instruments. W h a t they denied
was that qualitative data could be a legiti-
I assumed that the internalized effects of mate basis for drawing conclusions and
different kinds of school experience could be making judgments. But, as Shapiro found,
certain processes and outcomes are more equal attention, weight, and credence to
amenable to qualitative observation. It is qualitative and quantitative analyses. In-
worth remembering in this regard that one deed, the Joint Committee was absolutely
of the functions of scientific paradigms is diligent and precise about this equality of
to provide criteria for choosing problems treatment by formulating a standard for
that can be assumed to have solutions: each type of data with identical wording
"Changes in the standards governing per- except for the words quantitative and
missible problems, concepts, and explana- qualitative.
tions can transform a science" (Kuhn
1970:106). It was the failure of the quan- Standard on Analysis of
titative paradigm to answer important Quantitative Information
questions like those raised by Shapiro that Quantitative information in an evalu-
gradually made serious consideration of ation should be appropriately and systemati-
the qualitative paradigm so crucial for cally analyzed so that evaluation questions
evaluation research. are effectively answered, (p. A8)
A consensus has emerged that both
qualitative and quantitative data can con- Standard on Analysis of
tribute to all aspects of evaluative inquiries Qualitative Information
(Cook 1995; Sechrest 1992). Evaluators Qualitative information in an evaluation
must be able to use a variety of tools if they should be appropriately and systematically
are to be sophisticated and flexible in analyzed so that evaluation questions are
matching research methods to the nuances effectively answered, (p. A9)
of particular evaluation questions and the
idiosyncrasies of specific decision-maker
needs. There are no logical reasons why Naturalistic and Experimental
qualitative and quantitative methods can- Inquiry Options
not be used together (Patton 1982a).
Qualitative Evaluation and Research Meth- The paradigms debate was in part a
ods (Patton 1990) describes conditions un- debate about the relative importance of
der which qualitative methods are particu- causal questions in evaluation. Those
larly appropriate in evaluation research. evaluation researchers who believe that the
Sometimes quantitative methods alone are most important and central function of
most appropriate. But in many cases, evaluation is to measure the effects of pro-
both qualitative and quantitative methods grams on participants in order to make
should be used together. Where multiple valid causal inferences are strong advocates
methods are used, the contributions of each of randomized experiments as "the stan-
kind of data should be fairly assessed. In dard against which other designs for impact
many cases, this means that evaluators evaluation are judged" (Boruch and
working in teams will need to work hard to Rindskopf 1984:121). In advocating ex-
overcome their tendency to dismiss certain perimental designs, evaluation researchers
kinds of data without first considering se- such as Campbell and Boruch (1975) and
riously and fairly the merits of those data. Lipsey (1990) have demonstrated the
The Program Evaluation Standards power and feasibility of randomized ex-
(Joint Committee 1994) provide useful periments for a variety of programs
guidance in this regard, in that they give (Boruch et al. 1978). The concerns that
permeate these writings are concerns about istic inquiry is thus contrasted to experi-
increased rigor, well-controlled settings, mental research, in which, ideally, the in-
reduction of threats to internal validity, vestigator attempts to control conditions of
precise estimates of program effects, and the study completely by manipulating,
statistical power. changing, or holding constant external in-
Naturalistic inquiry, in contrast, influences and in which a very limited set of
volves observing ongoing programs as they outcome variables is measured.
unfold without attempting to control or
manipulate the setting, situation, people, Naturalistic inquiry aims at understanding
or data. Naturalistic inquiry investigates actualities, social realities, and human per-
"phenomena within and in relation to their ceptions that exist untainted by the
naturally occurring context" (Willems and obtrusiveness of formal measurement or pre-
Raush 1969:3). The extent to which any conceived questions. It is a process geared to
particular investigator engages in naturalis- the uncovering of many idiosyncratic but
tic inquiry varies along a continuum (Guba nonetheless important stories told by real
1978). It is certainly possible to enter a field people, about real events, in real and natural
situation and try to control what happens, ways. The more general the provocation, the
just as it is possible for the experimentalist more these stories will reflect what respon-
to control only the initial assignment to dents view as salient issues, meaningful
groups, then to watch what happens "natu- evidence, and appropriate inferences. . . .
rally." The important distinction is between Naturalistic inquiry attempts to present
relative degrees of calculated manipula- "slice-of-life" episodes documented through
tion. A naturalistic inquiry strategy is se- natural language and representing as closely
lected when the investigator wants to mini- as possible how people feel, what they know,
mize research manipulation by studying and what their concerns, beliefs, percep-
natural field settings; experimental conditions, and understandings are. (Wolf and
tions and designs are selected when the Tymitz (1976-77:6)
evaluator wants to introduce a consider-
able amount of control and reduce vari- Where the evaluator wants to know
ation in extraneous variables. about day-to-day life and work in pro-
Guba and Lincoln (1981) identified two gram settings, naturalistic inquiry replaces
dimensions along which types of scientific the static snapshots of traditional survey
inquiry can be described: the extent to research with a dynamic, process orienta-
which the scientist manipulates some phe- tion. To capture dynamic processes, the
nomenon in advance in order to study it, naturalistic inquiry evaluator eschews the
and the extent to which constraints are fixed comparisons of pre-post experimen-
placed on output measures; that is, the tal designs, instead making observations
extent to which predetermined categories periodically and systematically from be-
or variables are used to describe the ginning to end of participants' experi-
phenomenon under study. They then de- ences.
fine naturalistic inquiry as a "discovery- Qualitative data can be collected in ex-
oriented" approach that minimizes investi- perimental designs in which participants
gator manipulation of the study setting and have been randomly divided into treatment
places no prior constraints on what the and control groups. Likewise, some quan-
outcomes of the research will be. Natural- titative data may be collected in naturalistic
inquiry approaches. Such combinations proach to evaluation involves measuring

and flexibility are still rather rare, however. relative attainment of predetermined goals
Experimental designs predominantly aim in a randomized experiment that permits
for statistical analyses, whereas qualitative precise attribution of goal attainment to
data are the primary focus in naturalistic identifiable program treatments.
inquiry. Qualitative researchers ask questions
rather than test hypotheses. Inductive de-
signs allow the important analysis dimen-
Deductive and sions to emerge from patterns found in the
Inductive Approaches cases under study without presupposing
what the important dimensions will be.
Another point of friction in the para- Theories that may emerge about what is
digms debate has been the relative value happening in a program are grounded in
and feasibility of deductive and inductive direct program experience rather than
research strategies. With an inductive strat- imposed on the basis of predetermined,
egy, the evaluator attempts to make sense deductively derived constructs.
of a program without imposing preexist- Evaluation can be inductive in two ways.
ing expectations on the program setting.
Inductive designs begin with specific ob- 1. Within a particular program, induc-
servations and build toward general pat- tion means describing the experiences of
terns. Categories or dimensions of analysis individual participants, without pigeonhol-
emerge from open-ended observations as ing or delimiting what those experiences
the evaluator comes to understand pro- will be in advance of fieldwork.
gram patterns that exist in the empirical
world under study. Goal-free evaluation, 2. Between programs, inductive inquiry
discussed in Chapter 7, is inductive in the involves looking for unique institutional
sense that the evaluator enters the program characteristics that make each setting a case
with no knowledge of program goals, then unto itself. At either level, patterns across
observes the program and studies partici- cases emerge from thematic content analy-
pants to determine the extent to which sis, but the initial focus is on full under-
participants' needs are being met. standing of individual cases, before those
This contrasts with the hypothetico- unique cases are combined or aggregated.
deductive approach of experimental de-
signs, which requires the specification of At the simplest level, closed-ended ques-
main variables and the statement of specific tionnaires require deductive construction
research hypotheses before data collection while open-ended interviews depend on
begins. Specifying hypotheses based on an inductive analysis. A structured, multiple-
explicit theoretical framework means that choice question requires predetermining
general principles provide the framework response categories based on some theory
for understanding specific observations or or preordinate criteria about what is impor-
cases, as in theory-driven evaluation (Chen tant to measure. An open-ended interview,
1990. The evaluator must decide in ad- on the other hand, asks the respondent to
vance what variables are important and describe what is meaningful and salient
what relationships among those variables without being pigeonholed into stan-
are to be tested. The classic deductive ap- dardized categories. In practice, these ap-
proaches are often combined, not only in are asked and the nature of admissible
the same study, but in the same instrument. solutions.
Some evaluation questions are determined Michael Scriven (1972a), evaluation's
deductively while others are left suffi- long-time resident philosopher, has insisted
ciently open to permit inductive analyses that quantitative methods are no more syn-
based on direct observations. onymous with objectivity than qualitative
The paradigms debate has sharpened methods are synonymous with subjectivity:
our understanding of the strengths and
weaknesses of each strategy, and an evalu- Errors like this are too simple to be explicit.
ation can include elements of both as, for They are inferred confusions in the ideologi-
example, when the evaluation flows from cal foundations of research, its interpre-
inductive inquiry—to find out what the tations, its application. . . . It is increasingly
important questions and variables are (ex- clear that the influence of ideology on meth-
ploratory work)—to deductive hypothesis odology and of the latter on the training and
testing aimed at confirming exploratory behavior of researchers and on the identifi-
findings, then back again to inductive cation and disbursement of support is
analysis to look for rival explanations and staggeringly powerful. Ideology is to re-
unanticipated or unmeasured factors. search what Marx suggested the economic
factor was to politics and what Freud took
sex to be for psychology, (p. 94)
From Objectivity Versus
Subjectivity to The possibility that "ideological" pre-
Fairness and Balance conceptions can lead to dual perspectives
about a single phenomenon goes to the
Qualitative evaluators are accused fre- very heart of the contrasts between para-
quently of subjectivity—a term with the digms. Two scientists may look at the
power of an epithet in that it connotes the same thing, but because of different
very antithesis of scientific inquiry. Objec- theoretical perspectives, assumptions, or
tivity has been considered the sine qua non ideology-based methodologies, they may
of the scientific method. To be subjective literally not see the same thing (Petrie
has meant to be biased, unreliable, and 1972:48). Indeed, Kuhn (1970) has
nonrational. Subjectivity implies opinion pointed out,
rather than fact, intuition rather than logic,
and impression rather than rigor. Evalua- Something like a paradigm is prerequisite to
tors are advised to avoid subjectivity and perception itself. What a man sees depends
make their work "objective and value- both upon what he looks at and also upon
free." what his previous visual-conceptual experi-
In the paradigms debate, the means ad- ence has taught him to see. In the absence of
vocated by scientists for controlling sub- such training there can only be, in William
jectivity through the scientific method James's phrase, "a bloomin' buggin' confu-
were the techniques of the dominant quan- sion." (p. 113)
titative experimental paradigm. Yet, the
previous section observed that quanti- A child's parable, the story of Han and
tative methods can work in practice to limit the Dragon, illustrates this point at an-
and even bias the kinds of questions that other level of simplicity. Han, a small boy,
lived in a city threatened by wild horse- views about an issue can't be unprejudiced.
men from the north. The Mandarin ruler The key question is whether the views are
and his advisers decided that only the justified. The fact that we all have strong
Great Cloud Dragon could save the city, views about the sexual abuse of small chil-
so they prayed for the Dragon's interven- dren and the importance of education does
tion. As he prayed, the Mandarin envi- not show prejudice, only rationality, (p. 248)
sioned a dragon that looked like a proud
lord—a Mandarin. The captain of the The debate about objectivity versus
army imagined and prayed to a dragon subjectivity includes different assump-
that looked like a warrior. The merchant tions about whether it is possible for us to
thought that a dragon would appear rich view the complexities of the real world
and splendid, as he was. The chief work- without somehow filtering and simplify-
man was convinced that a dragon would ing those complexities. The qualitative
be tough and strong. The wise man con- assumption is that, at even the most basic
ceived of the dragon as "the wisest of all level of sensory data, we are always deal-
creatures," which meant it must look like ing with perceptions, not "facts" in some
a wise man. In the midst of the crisis, a absolute sense. "The very categories of
small fat man with long beard and bald things which comprise the 'facts' are the-
head arrived and announced that he was ory dependent" (Petrie 1972:49) or, in
the Great Cloud Dragon. The Mandarin this case, paradigm dependent. It was this
and his advisers ridiculed the old man and recognition that led the distinguished
dismissed him rudely. Only because of qualitative sociologist Howard Becker
Han's kindness did the old man save the (1970) to argue that "the question is not
city, transforming himself into a magnifi- whether we should take sides, since we
cent dragon the color of sunset shining inevitably will, but rather whose side we
through rain, scales scattering the light, are on" (p. 15).
claws and teeth glittering like diamonds, The paradigms offer differ perspectives
beautiful and frightening at the same time, on the nature of "human reality" and thus
and most important, beyond any possibil- have different conceptions of the role of
ity of preconception because the dragon research in predicting human reality. The
was beyond prior human experience. But, quantitative/experimental paradigm con-
only Han saw the dragon, because only he ceives of science as the search for truth
was open to seeing it. about a singular reality, thus the impor-
Qualitative researchers prefer to de- tance of objectivity. The qualitative/natu-
scribe themselves as open rather than ralistic paradigm searches for perspective
subjective. They enter a setting without and understanding in a world of multiple
prejudgment, including no preconceived "realities," thus the inevitability of subjec-
hypotheses to test. tivity. Although the possibility of attaining
Scriven (1991a) has defined objectiv- objectivity and truth in any absolute sense
ity as being "unbiased or unprejudiced," has become an untenable position in evalu-
literally, not having "prejudged." This ation, the negative connotations associated
definition with the term subjectivity make it an unac-
ceptable alternative. There is a solution.
misleads people into thinking that anyone As a utilization-focused evaluator, being
who comes into a discussion with strong practical in orientation, I prefer to replace
the traditional scientific search for objec- them heavily determines what the re-
tive truth with a search for useful and bal- porter perceives. Hence one test of fair-
anced information. For the classic mandate ness is the length to which the reporter
to be objective, I substitute the mandate to will go to test his own biases and rule
be fair and conscientious in taking account them out.
of multiple perspectives, multiple interests, • It is a relative criterion that is measured
and multiple realities. In this regard, Egon by balance rather than by isomorphism to
Guba (1981) has suggested that evaluators enduring truth.
could learn from investigative journalists.
Clearly, evaluators have a great deal to
Journalism in general and investigative jour- learn from this development, (pp. 76-77)
nalism in particular are moving away from
the criterion of objectivity to an emergent T h e Program Evaluation Standards re-
criterion usually labeled "fairness." . . . Ob- flect this change in emphasis:
jectivity assumes a single reality to which the
story or evaluation must be isomorphic; it is
in this sense a one-perspective criterion. It Propriety Standard on Complete and Fair
assumes that an agent can deal with an ob- Assessment: The evaluation should be com-
ject (or another person) in a nonreactive plete and fair in its examination and
and noninteractive way. It is an absolute recording of strengths and weaknesses of the
criterion. program being evaluated, so that strengths
Journalists are coming to feel that objec- can be built upon and problem areas ad-
tivity in that sense is unattainable. . . . dressed. (Joint Committee 1994:P5)
Enter "fairness" as a substitute criterion.
In contrast to objectivity, fairness has these
features: Accuracy Standard on Impartial Reporting:
Reporting procedures should guard against
• It assumes multiple realities or truths— distortion caused by personal feelings and
hence a test of fairness is whether or not biases of any party to the evaluation, so that
"both" sides of the case are presented, evaluation reports fairly reflect the evalu-
and there may even be multiple sides. ation findings. (Joint Committee 1994:All)
• It is adversarial rather than one-perspec-
tive in nature. Rather than trying to hew W o r d s such as fairness, neutrality, and
the line with the truth, as the objective impartiality carry less baggage than objec-
reporter does, the fair reporter seeks tivity and subjectivity. T o stay out of ar-
to present each side of the case in the guments about objectivity, I talk with in-
manner of an advocate—as, for example, tended users about balance, fairness, and
attorneys do in making a case in court. being explicit about what perspectives,
The presumption is that the public, like a values, and priorities have shaped the
jury, is more likely to reach an equitable evaluation, both the design and findings.
decision after having heard each side pre- Others choose to use the term objective
sented with as much vigor and commit- because of its political power. At the
ment as possible. national policy level, 1995 American
• It is assumed that the subject's reaction to Evaluation Association President Eleanor
the reporter and interaction between Chelimsky (1995a) recommended thus:
Although all of us realize that we can never quotive, constitutes a significant commit-
be entirely objective, that is hardly an excuse ment to represent the participants in their
for skewed samples, or grandiloquent con- own terms" (p. 4).
clusions or generalizations that go beyond The desire for closeness derives from the
the evaluator's data, or for any of 101 indi- assumption that the inner states of people
cations to a careful reader that a particular are important and can be known. From this
result is more desired than documented. flows a concern with meaning, mental
There are, in fact, a great many things that states, and worldview. Attention to inner
we can do to foster objectivity and its appear- perspectives does not mean administering
ance, not just technically, in the steps we take attitude surveys. "The inner perspective
to make and explain our evaluative deci- assumes that understanding can only be
sions, but also intellectually, in the effort we achieved by actively participating in the
put forth to look at all sides and all stake- life of the observed and gaining insight
holders of an evaluation, (p. 219) by means of introspection" (Bruyn 1966:
226). For evaluators, this can even mean
undertaking observation by being a pro-
The Continuum of gram participant, where possible and ap-
Distance From Versus propriate.
Closeness to the Program
Here are the opposing paradigm posi- In order to capture the participants "in their
tions: Too much closeness may compro- own terms" one must learn their analytic
mise objectivity. Too much distance may ordering of the world, their categories for
diminish insight and understanding. rendering explicable and coherent the flux
Quantitative researchers depend on dis- of raw reality. That, indeed, is the first
tance to guarantee neutrality and academic principle of qualitative analysis. (Lofland
integrity. Scholarly comportment connotes 1971:7; emphasis in original)
calm and detached analysis without per-
sonal involvement or emotion. The quali- In the Shapiro study of Follow
tative paradigm, in contrast, assumes that Through open classrooms, her presence in
without empathy and sympathetic intro- classrooms over an extended period of
spection derived from direct experience, time and her closeness to the children
one cannot fully understand a program. allowed her to see things that were not
Understanding comes from trying to put captured by standardized tests. She could
oneself in the other person's shoes, thereby see what they were learning. She could
discerning how others think, act, and feel. feel their tension in the testing situation
Qualitative methodologist John Lofland and their spontaneity in the more natural
(1971) has explained that methodologi- classroom setting. Had she worked solely
cally this means getting close to the people with data collected by others or only at a
being studied through attention to details, distance, she would never have discovered
by being where they are over a period of the crucial differences she uncovered be-
time, and through development of close- tween Follow Through and non-Follow
ness in the social sense of intimacy and Through classrooms—differences that al-
confidentiality. "The commitment to get lowed her to evaluate the innovative pro-
close, to be factual, descriptive, and gram in a meaningful and relevant way.
In a similar vein, one evaluator in our involving large groups, distance is inevita-
utilization of federal health evaluations ex- ble. But, where possible, face-to-face inter-
pressed frustration at trying to make sense action can deepen insight, especially in pro-
out of data from over 80 projects when site gram evaluation. This returns us to the
visit funds were cut out of the evaluation: recurrent theme of matching evaluation
"There's n o way to understand something methods to intended use by intended users.
that's just data, you know. You have to go
look" [EV111-.3]. Lofland (1971) con-
cluded likewise, Of Variables and Wholes
In everyday life, statistical sociologists, like T h e quantitative/experimental para-

everyone else, assume that they do not know digm operationalizes independent and de-
or understand very well people they do not pendent variables, then measures their re-
see or associate with very much. They as- lationships statistically. Outcomes must be
sume that knowing and understanding other identified and measured as specific vari-
people require that one see them reasonably ables. Treatments and programs must also
often and in a variety of situations relative to be conceptualized as discrete, independent
a variety of issues. Moreover, statistical soci- variables. Program participants are also de-
ologists, like other people, assume that in scribed along standardized, quantified di-
order to know or understand others, one is mensions. Sometimes a program's goals are
well-advised to give some conscious atten- measured directly, for example, student
tion to that effort in face-to-face contacts. achievement test scores, recidivism statis-
They assume, too, that the internal world of tics for a group of juvenile delinquents, or
sociology—or any other social world—is not sobriety rates for participants in chemical
understandable unless one has been part of dependency treatment programs. Evalu-
it in a face-to-face fashion for quite a period ation measures can also be indicators of a
of time. How utterly paradoxical, then, for larger construct, for example, "community
these same persons to turn around and make, well-being" as a general construct mea-
by implication, precisely the opposite claim sured by indicators such as crime rates, fetal
about people they have never encountered deaths, divorce, unemployment, suicide,
face-to-face—those people appearing as and poverty (Brock, Schwaller, and Smith
numbers in their tables and as correlations in 1985).
their matrices! (p. 3) Adherents of the qualitative paradigm
argue that the variables-based a p p r o a c h
It is instructive to remember that many (1) oversimplifies the interconnected
major contributions to our understanding complexities of real-world experiences,
of the w o r l d have come from scientists' (2) misses major factors of importance that
personal experiences—Piaget's closeness are not easily quantified, and (3) fails to
to his children, Freud's proximity to and capture a sense of the program and its
empathy with his patients, Darwin's close- impacts as a "whole." The qualitative/natu-
ness to n a t u r e , and even N e w t o n ' s inti- ralistic paradigm strives to be holistic in
mate encounter with an apple. orientation. It assumes that the whole is
O n the other hand, closeness is not the greater than the sum of its parts; that the
only way to understand human behavior. parts cannot be understood without a sense
For certain questions and for situations of the whole; and that a description and
understanding of a program's context is focused in our research on parts to the

essential to an understanding of program virtual exclusion of wholes:
processes and outcomes. This, of course,
follows the wisdom of the fable about the
We knew that human behavior was rarely if
blind children and the elephant. As long as
ever directly influenced or explained by an
each felt only a part—a fan-like ear, the
isolated variable; we knew that it was impos-
rope-like tail, a tree-like leg, the snake-like
sible to assume that any set of such variables
trunk—they could not make sense of the was additive (with or without weighting); we
whole elephant. The qualitative, systems- knew that the complex mathematics of the
oriented paradigm goes even further. Un- interaction among any set of variables was
less they could see the elephant at home in incomprehensible to us. In effect, although
the African wilderness, they would not un- we knew they did not exist, we defined them
derstand the elephant's ears, legs, trunk, into being, (p. 33)
and skin in relation to how the elephant has
evolved in the context of its ecological
Although most scientists would view
niche.
this radical critique of variable analysis as
Philosopher and educator John Dewey too extreme, I find that teachers and prac-
(1956a) advocated a holistic approach to titioners often voice the same criticisms.
both teaching and research, if one was to Innovative teachers complain that experi-
reach into and understand the world of the mental results lack relevance for them be-
child. cause they have to deal with the whole in
their classrooms; they can't manipulate
The child's life is an integral, a total one. He just a couple of factors in isolation from
passes quickly and readily from one topic to everything else going on. The reaction of
another, as from one spot to another, but is many program staff to scientific research
not conscious of transition or break. There is like the reaction of Copernicus to the
is no conscious isolation, hardly conscious astronomers of his day. "With them," he
distinction. The things that occupy him are observed,
held together by the unity of the personal
and social interests which his life carries
it is as though an artist were to gather the
along. . . . [His] universe is fluid and fluent;
hands, feet, head, and other members for his
its contents dissolve and re-form with amaz-
images from diverse models, each part excel-
ing rapidity. But after all, it is the child's own
lently drawn, but not related to a single body,
world. It has the unity and completeness of
and since they in no way match each other,
his own life. (pp. 5-6)
the result would be monster rather than man.
(quoted in Kuhn 1970:83)
Again, Shapiro's (1973) work in evalu-
ating innovative Follow Through class-
rooms is instructive. She found that test How many program staff have complained
results could not be interpreted without of the evaluation research monster?
understanding the larger cultural and in- Yet, it is no simple task to undertake
stitutional context in which the individual holistic evaluation, to search for the Gestalt
child was situated. Deutscher (1970) adds in programs. The challenge for the partici-
that despite our personal experiences as pant observer is "to seek the essence of
living, working human beings, we have the life of the observed, to sum up, to
find a central unifying principle" (Bruyn desired change (Lipsey 1990; Boruch and
1966:316). Rindskopf 1984; Mark and Cook 1984).
The advantages of using variables and In contrast, the qualitative/naturalistic
indicators are parsimony, precision, and paradigm conceives of programs as dy-
ease of analysis. Where key program ele- namic and ever developing, with "treat-
ments can be quantified with validity, reli- ments" changing in subtle but important
ability, and credibility, and where necessary ways as staff learn, as clients move in and
statistical assumptions can be met (e.g., out, and as conditions of delivery are al-
linearity, normality, and independence of tered. Qualitative/naturalistic evaluators seek
measurement), statistical portrayals can be to describe these dynamic program pro-
quite powerful and succinct. The advan- cesses and understand their holistic effects
tage of qualitative portrayals of holistic on participants. Thus, part of the para-
settings and impacts is that attention can be digms debate has been about the relative
given to nuance, setting, interdependen- utility, desirability, and possibility of under-
cies, complexities, idiosyncracies, and con- standing programs from these quite differ-
text. In combination, the two approaches ent perspectives for different purposes.
can be powerful and comprehensive; they The quantitative/experimental/summa-
can also be contradictory and divisive. tive approach is most relevant for fairly
established programs with stable, consis-
tent, and identifiable treatments and clearly
Two Views of Change quantifiable outcomes, in which a major
decision is to be made about the effective-
The paradigms debate is in part about ness of one treatment in comparison to
how best to understand and study change. another (or no) treatment.
The quantitative/experimental paradigm The qualitative/naturalistic/formative ap-
typically involves gathering data at two proach is especially appropriate for devel-
points in time, pretest and posttest, then oping, innovating, or changing programs in
comparing the treatment group to the con- which the focus is improving the program,
trol group statistically. Ideally, participants facilitating more effective implementation,
are assigned to treatment and control and exploring a variety of effects on par-
groups randomly, or, less ideally, are ticipants. This can be particularly impor-
matched on critical background variables. tant early in the life of a program or at
Such designs assume an identifiable, coher- major points of transition. As an innovation
ent, and consistent treatment. Moreover, or program change is implemented, it fre-
they assume that, once introduced, the quently unfolds in a manner quite different
treatment remains relatively constant and from what was planned or conceptualized
unchanging. In some designs, time series in a proposal. Once in operation, innova-
data are gathered at several predetermined tive programs are often changed as practi-
points rather than just at pretest and post- tioners learn what works and what does
test. The purpose of these designs is to not, and as they experiment, grow, and
determine the extent to which the pro- change their priorities.
gram (treatment) accounts for measurable Changing developmental programs can
changes in participants in order to make a frustrate evaluators whose design approach
summative decision about the value and depends on specifiable unchanging treat-
effectiveness of the program in producing ments to relate to specifiable predeter-
mined outcomes. Evaluators have been "from unwitting captivity to a format of

known to do everything in their power to inquiry that is taken for granted as the
stop program adaptation and improve- naturally proper way in which to conduct
ment so as to maintain the rigor of their scientific inquiry" (Blumer 1969:47).
research design (see Parlett and Hamilton Nowhere is this unwitting captivity bet-
1976). The deleterious effect this may have ter illustrated than in those agencies that
on the program itself, discouraging as it insist, in the name of science, that all evalu-
does new developments and redefinitions ations must employ experimental designs.
in midstream, is considered a small sacrifice Two examples will illustrate this problem.
made in pursuit of higher-level scientific In Minnesota, the Governor's Commission
knowledge. But there is a distinct possi- on Crime Prevention and Control required
bility that such artificial evaluation con- experimental evaluation designs of all
straints will contaminate the program funded projects. A small Native American
treatment by affecting staff morale and par- alternative school was granted funds to run
ticipant response. an innovative crime prevention project
Were some science of planning and pol- with parents and students. The program
icy or program development so highly was highly flexible; participation was ir-
evolved that initial proposals were perfect, regular and based on self-selection. The
one might be able to sympathize with these program was designed to be sensitive to
evaluators' desire to keep the initial pro- Native American culture and values. It
gram implementation intact. In the real would have been a perfect situation for
world, however, people and unforeseen formative responsive evaluation. Instead,
circumstances shape programs, and initial program staff were forced to create the
implementations are modified in ways that illusion of an experimental pretest and
are rarely trivial. posttest design. The evaluation design in-
Under conditions in which programs are terfered with the program, alienated staff,
subject to change and redirection, the natu- wasted resources, and collected worthless
ralistic evaluation paradigm replaces the information, unrelated to evolving pro-
static underpinnings of the experimental gram operations, under the guise of main-
paradigm with a dynamic orientation. A taining scientific consistency. The evalua-
dynamic evaluation is not tied to a single tors refused to alter or adapt the design and
treatment or to predetermined outcomes data collection in the face of a program
but, rather, focuses on the actual opera- dramatically different from the preconcep-
tions of a program over a period of time, tions on which they had based the design.
taking as a given the complexity of a chang- The second example is quite similar but
ing reality and variations in participants' concerns the Minnesota Department of
experiences over the course of program Education. The state monitor for an inno-
participation. vative arts program in a free school for
Again, the issue is one of matching the at-risk students insisted on quantitative,
evaluation design to the program, of mesh- standardized test measures collected in pre-
ing evaluation methods with decision- test and posttest situations; a control group
maker information needs. The point of was also required. The arts program was
contrasting fixed experimental designs being tried out in a free school as an at-
with dynamic process designs in the para- tempt to integrate art and basic skills. Stu-
digms debate was to release evaluators dents were self-selected and participation
was irregular; the program had multiple of statistical significance. Qualitative in-
goals, all of them vague; even the target quiry involves small "purposeful samples"
population was fuzzy; and the treatment of information-rich cases (Patton 1990:
depended on who was in attendance on a 169-86. Differences in logic and assump-
given day. The free school was a highly tions between these sampling strategies
fluid environment for which nothing close illuminate paradigm differences.
to a reasonable control or comparison When the evaluation or policy question
group existed. The teaching approach was is aimed at generalizations, some form of
highly individualized, with students design- random, probabilistic sampling is the de-
ing much of their program of study. Both sign of choice. A needs assessment, for
staff and students resented the imposition example, aimed at determining how many
of rigid, standardized criteria that gave the residents in a county have some particular
appearance of a structure that was not
problem would suggest the need for a ran-
there. Yet, the Department of Education
dom sample of county residents.
insisted on a static, hypothetico-deductive
Case studies, on the other hand, become
evaluation approach because "it's depart-
particularly useful when intended users
mental evaluation policy."
need to understand a problem, situation, or
On the other hand, the direction of the
program in great depth, and they can iden-
design error is not always the imposition of
tify cases rich in needed information—rich
overly rigid experimental formats. Camp-
in the sense that a great deal can be learned
bell and Boruch (1975) have shown that
from a few exemplars of the phenomenon
many evaluations suffer from an underutili-
zation of more rigid designs. They have of interest. For example, much can be
made a strong case for randomized assign- learned about how to improve a program
ment to treatments by demonstrating six by studying dropouts or select successes.
ways in which quasi-experimental evalu- Such case studies can provide detailed un-
ations in compensatory education tend to derstanding of what is going on and solid
underestimate effects. grounds for making improvements.
Matching methods to programs and de- The best-selling management book In
cision-maker needs is a creative process Search of Excellence (Peters and Waterman
that emerges from a thorough knowledge 1982), studied 50 corporations with out-
of the organizational dynamics and infor- standing reputations for excellence to learn
mation uncertainties of a particular con- lessons about what these exemplars were
text. Regulations to the effect that all evalu- doing right. The problem with this ap-
ations must be of a certain type serve proach is yielding to the temptation to
neither the cause of increased scientific inappropriately generalize case study find-
knowledge nor that of greater program ings to the entire population, as when man-
effectiveness. agement consultants generalized the les-
sons from In Search of Excellence to all of
corporate America—indeed, to all organi-
Alternative Sampling Logics zations of all kinds in the world! It is pre-
cisely such overgeneralizations that have
The quantitative paradigm employs ran- led advocates of randomized, probabilistic
dom samples sufficient in size to permit sampling to be suspicious of case studies
valid generalizations and appropriate tests and purposeful sampling.
On the other hand, qualitative metho- and probabilistic. Evaluation users often
dologists are suspicious of generalizations expect evaluators to thoughtfully extrapo-
based on statistical inference at a single late from their findings in the sense of
point in time. Findings based on samples, pointing out lessons learned and potential
however large, are often stripped of their applications to future efforts.
context when generalizations are made Designs that combine probabilistic and
across time and space. Cronbach (1975) purposeful sampling have the advantage of
has observed that generalizations decay extrapolations supported by quantitative
over time; that is, they have a half-life much and qualitative data. Larger samples of sta-
like radioactive materials. Guba and Lin- tistically meaningful data can address ques-
coln (1981) were particularly critical of the tions of incidence and prevalence (generali-
dependence on generalizations in quantita- zations), while case studies add depth and
tive methods because, they ask, "What can detail to make interpretations more mean-
a generalization be except an assertion that ingful and grounded. Such designs can also
is context free? . . . [Yet] It is virtually im- introduce a balance between concerns
possible to imagine any human behavior about individualization and standardiza-
that is not heavily mediated by the context tion, the distinction in the next section.
in which it occurs" (p. 62; emphasis in
original).
Cronbach and colleagues in the Stanford Standardization or Diversity:
Evaluation Consortium (1980) offered a Different Emphases
middle ground in the paradigms debate
with regard to the problem of generalizabil- The quantitative paradigm requires the
ity and the relevance of evaluations. They variety of human experience to be captured
criticized experimental designs that were so along standardized scales. Individuals and
focused on controlling cause and effect that groups are described as exhibiting more or
the results were largely irrelevant beyond less of some trait (self-esteem, satisfaction,
the experimental situation. On the other competence, knowledge), but everyone is
hand, they were equally concerned that rated or ranked on a limited set of prede-
entirely idiosyncratic case studies yield lit- termined dimensions. Statistical analyses of
tle of use beyond the case study setting. these dimensions present central tenden-
They suggested, instead, that designs bal- cies (averages and deviations from those
ance depth and breadth, realism and con- averages). Critics of standardized instru-
trol, so as to permit reasonable extrapo- mentation and measurement are concerned
lation (pp. 231-35). Unlike the usual that such an approach only captures quan-
meaning of the term generalization, an ex- titative differences, thereby missing signifi-
trapolation connotes that one has gone be- cant qualitative differences and important
yond the narrow confines of the data to idiosyncrasies. Critics of statistics are fond
think about other applications of the find- of telling about the person who drowned
ings. Extrapolations are modest specula- in a creek with an average depth of six
tions on the likely applicability of findings inches; what was needed was some in-
to other situations under similar, but not depth information about the six-foot pool
identical, conditions. Extrapolations are in the middle of the creek.
logical, thoughtful, and problem-oriented The qualitative paradigm pays particu-
rather than purely empirical, statistical, lar attention to uniqueness, whether this be
an individual's uniqueness or the unique- characteristics in people and programs and

ness of program, community, home, or to similarities among people and common-
other unit of analysis. When comparing alities across treatments. Case studies can
programs, the qualitative evaluator begins and do accumulate. Anthropologists have
by trying to capture the unique, holistic built up an invaluable wealth of case study
character of each program with special at- data that includes both idiosyncratic infor-
tention to context and setting. Patterns mation and patterns of culture.
across individuals or programs are sought Using both quantitative and qualitative
only after the uniqueness of each case has approaches can permit the evaluator to
been described. address questions about quantitative differ-
For program staff in innovative pro- ences on standardized variables and quali-
grams aimed at individualizing treatments, tative differences reflecting individual and
the central issue is how to identify and deal program uniquenesses. The more a pro-
with individual differences among partici- gram aims at individualized outcomes, the
pants. Where the emphasis is on individu- greater the appropriateness of qualitative
alization of teaching or on meeting the methods. The more a program emphasizes
needs of individual clients in social action common outcomes for all participants, the
programs, an evaluation strategy of case greater the appropriateness of standardized
studies is needed that focuses on the indi- measures of performance and change.
vidual, one that is sensitive both to unique
Whither the Evaluation Methods Paradigms Debate?

The Debate Has Withered
valuation is much too important to be left to the methodologists.
—Halcolm
The history of the paradigms debate par- (1975) critique of evaluation quality and
allels the history of evaluation. The earliest such popular texts as Campbell and Stanley
evaluations focused largely on quantitative (1963), Weiss (1972b), Suchman (1972),
measurement of clear, specific goals and Rutman (1977), and the first edition of
objectives. With the widespread social and Evaluation (Rossi et al. 1979).
educational experimentation of the 1960s By the middle 1970s, the paradigms de-
and early 1970s, evaluation designs were bate was becoming a major focus of evalu-
aimed at comparing the effectiveness of ation discussions and writings (Cronbach
different programs and treatments through 1975; Patton 1975a; Parlett and Hamilton
rigorous controls and experiments. This 1972). By the late 1970s, the alternative
was the period when the quantitative/ex- qualitative/naturalistic paradigm had been
perimental paradigm dominated, as repre- fully articulated (Guba 1978; Patton 1978;
sented by the Bernstein and Freeman Stake 1978). During this period, concern
about finding ways to increase use became 2. The utilization crisis focused atten-
predominant in evaluation (see Chapter 1), tion on the need for methodological flexi-
and evaluators began discussing standards. bility. When the utilization crisis emerged
A period of pragmatism and dialogue fol- in the 1960s, there were two major kinds
lowed, during which calls for and experi- of recommendations for increasing evalu-
ences with multiple methods and a synthe- ation use. One kind focused on upgrading
sis of paradigms became more common methodological rigor as a way of increasing
(House 1980; Reichardt and Cook 1979; the accuracy, reliability, and validity of
Rist 1977). evaluation data, and thereby increasing
The advice of Cronbach et al. (1980), in use. The second set of recommendations
their important book on reform of program focused on evaluation processes: increasing
evaluation, was widely taken to heart: attention to stakeholder needs, acting with
greater political savvy, championing find-
The evaluator will be wise not to declare ings among intended users, and match-
allegiance to either a quantitative-scientific- ing methods to questions. Methodological
summative methodology or a qualitative- rigor alone has not proven an effective
naturalistic-descriptive methodology, (p. 7) strategy for increasing use. Direct attention
to issues of use, as in utilization-focused
Signs of detente and pragmatism now evaluation, has proven effective.
abound. Methodological tolerance, flexi-
bility, eclecticism, and concern for appro- 3. The articulation of professional
priateness rather than orthodoxy now standards by evaluation associations has
characterize the practice, literature, and emphasized methodological appropriate-
discussions of evaluation. Ten develop- ness rather than paradigm orthodoxy. The
ments seem to me to explain the withering Program Evaluation Standards (Joint Com-
of the methodological paradigms debate. mittee 1994,1981), the Guiding Principles
of the American Evaluation Association
1. Evaluation has emerged as a genu- (Shadish et al. 1995), and the earlier stan-
inely interdisciplinary and multimethod dards of the Evaluation Research Society
field of professional practice. Evaluation (1980) before it merged into the American
began as a specialization within separate Evaluation Association all emphasize accu-
social science and educational disciplines. racy and systematic data collection within
The methods expertise of evaluators was a context that takes into account varying
closely tied to the methodological focus of evaluation purposes, stakeholders, and
their discipline of origin. In recent years, uses—and, therefore, varying methods.
however, courses and programs have The standards and AEA guiding principles
emerged for training evaluators that fo- provide a basis other than methodological
cus attention on evaluation as an inter- rigor for judging the excellence of evalu-
disciplinary, practical, professional, and ations. This has made it possible to employ
problem-solving effort (Altschuld and a variety of methods, including qualitative
Engle 1994). This has permitted more bal- ones, and still do an evaluation judged of
anced training and a more balanced ap- high quality.
proach to methods, which emphasizes
methodological appropriateness rather 4. The accumulation of practical evalu-
than disciplinary orthodoxy. ation experience during the last two dec-
ades has reduced paradigms polarization. has directed attention to the political na-
The practical experience of evaluators in ture of evaluation, the need to integrate
attempting to work with programs to im- evaluation into program processes, work-
prove their effectiveness has led evaluators ing with stakeholders throughout the
to become pragmatic in their approaches to evaluation process and laying a solid foun-
methods issues, and in that pragmatism has dation for the use of evaluation. While high
emerged a commitment to do what works quality and appropriate methods remain
rather than a commitment to methodologi- important, methods decisions are now
cal rigor as an end in itself. framed in a broader context of use which,
I believe, has reduced the intensity of the
5. The strengths and weaknesses of paradigms debate, a debate that often went
both quantitative/experimental methods on in absolute terms—context-free.
and qualitative/naturalistic methods are
now better understood. In the original de- 7. Advances in methodological sophis-
bate, quantitative methodologists tended tication and diversity within both para-
to attack some of the worst examples of digms have strengthened diverse appli-
qualitative evaluations while the qualita- cations to evaluation problems. The
tive evaluators tended to hold up for cri- proliferation of books and journals in
tique the worst examples of quantita- evaluation, including but not limited to
tive/experimental approaches. With the methods contributions, has converted the
accumulation of experience and confi- field into a rich mosaic that cannot be
dence, exemplars of both qualitative and reduced to quantitative versus qualitative
quantitative approaches have emerged in primary orientation. This is especially
with corresponding analyses of the true of qualitative methods, which had
strengths and weaknesses of each. This has more catching up to do, in which a great
permitted more balance and a better under- deal of important work has been published
standing of the situations for which various addressing questions of validity, reliability,
methods are most appropriate as well as and systematic analysis (Coffey and Atkin-
grounded experience in how to combine son 1996; Maxwell 1996; Stake 1995;
methods. Denzin and Lincoln 1994; Miles and
Huberman 1994; Patton 1990; Williams
6. A broader conceptualization of 1986). The paradigms debate, in part, in-
evaluation, and of evaluator training, has creased the amount of qualitative work
directed attention to the relation of meth- being done, created additional opportuni-
ods to other aspects of evaluation, such as ties for training in qualitative methods, and
use, and has therefore reduced the intensity brought attention by methodologists to
of the methods debate as a topic unto itself. problems of increasing the quality of quali-
Early evaluation texts defined evaluation tative data. As the quality of qualitative
narrowly as the application of social sci- methods has increased, as training in quali-
ence methods to measure goal attainment. tative methods has improved (e.g., Levine
et al. 1980), and as claims about qualitative
More recent definitions of evaluation, in-
methods have become more balanced, the
cluding the one in this book (Chapter 2),
attacks on qualitative methods have be-
emphasize providing useful information
come less strident. Moreover, the upshot of
for program improvement and decision
all the developmental work in qualitative
making. This broader conceptualization
methods is that "today there is as much evaluation has increased methodological

variation among qualitative researchers as tolerance. Early in this chapter, I noted that
there is between qualitatively and quantita- when eminent measurement and methods
tively oriented scholars" (Donmoyer, scholars such as Donald Campbell and Lee
1996:21). The same can be said of de- J. Cronbach, their commitment to rigor
velopments in quantitative-experimental never being in doubt (see Shadish et al.
methods, as methodologists have focused 1991 for their pioneering contributions to
on fine-tuning and adapting social science evaluation theory and practice), began pub-
methods to evaluation and public policy licly recognizing the contributions that
situations (e.g., Davidson 1996; Folz 1996; qualitative methods could make, the ac-
Yates 1996; Fink 1995; Conrad 1994; Cor- ceptability of qualitative/naturalistic ap-
dray 1993; Sechrest and Scott 1993; Lipsey proaches was greatly enhanced. Another
1990; Trochim 1986; Boruch and Wothke important endorsement of multiple meth-
1985; and the extensive Sage series on ods has come from the Program Evaluation
Quantitative Applications in the Social Sci- and Methodology Division of the United
ences). Lipsey (1988), whose quantitative States General Accounting Office (GAO),
credentials are impeccable, epitomized the which arguably did the most important
emergent commitment to matching meth- and influential evaluation work at the na-
ods to problems and situations when he tional level (until it was disbanded in
concluded: 1996). Under the leadership of Assistant
Comptroller General and former American
Much less evaluation research in the quanti- Evaluation Association President (1995)
tative-comparative mode should be done. Eleanor Chelimsky, GAO published a series
Though it is difficult to ignore the attractive- of methods manuals, including Quantita-
ness of assessing treatment effects via formal tive Data Analysis (GAO 1992d), Case
measurement and controlled design, it is in- Study Evaluations (GAO 1990a), Pro-
creasingly clear that doing research of this spective Evaluation Methods (GAO 1990b),
sort well is quite difficult and should be and The Evaluation Synthesis (GAO 1992c).
undertaken only under methodologically fa- The GAO manual on Designing Evalu-
vorable circumstances, and only then with ations (1991) puts the paradigms debate to
extensive prior pilot-testing regarding mea- rest as it describes what constitutes a strong
sures, treatment theory, and so forth. The evaluation. Strength is not judged by adher-
field of evaluation research and the individ- ence to a particular paradigm. It is deter-
ual treatments evaluated would generally be mined by use and technical adequacy, what-
better served by a thorough descriptive, per- ever the method, within the context of
haps qualitative, study as a basis for forming purpose, time, and resources.
better concepts about treatment, or a good
management information system that pro- Strong evaluations employ methods of
vides feedback for program improvement, or analysis that are appropriate to the question,
a variety of other approaches rather than by support the answer with evidence, document
a superficially impressive but largely invalid the assumptions, procedures, and modes of
experimental study, (pp. 22-23) analysis, and rule out the competing evi-
dence. Strong studies pose questions clearly,
8. Support for methodological eclecti- address them appropriately, and draw infer-
cism from major figures and institutions in ences commensurate with the power of the
design and the availability, validity, and reli- 10. There is increased advocacy of and
ability of the data. Strength should not be experience in combining qualitative and
equated with complexity. Nor should strength quantitative approaches. The volume of
be equated with the degree of statistical ma- New Directions for Program Evaluation on
nipulation of data. Neither infatuation with "The Qualitative-Quantitative Debate:
complexity nor statistical incantation makes N e w Perspectives" (Reichardt and Rallis
an evaluation stronger. 1994a) included these themes: "blended
The strength of an evaluation is not de- approaches," "integrating the qualitative
fined by a particular method. Longitudinal, and quantitative," "possibilities for integra-
experimental, quasi-experimental, before- tion," "qualitative plus quantitative" and
and-after, and case study evaluations can be "working together" (Datta 1994; Hedrick
either strong or weak. . . . That is, the 1994; House 1994; Reichardt and Rallis
strength of an evaluation has to be judged 1994c; Smith 1994). As evaluators have
within the context of the question, the time worked to focus evaluation questions and
and cost constraints, the design, the technical gather useful information, they have begun
adequacy of the data collection and analysis, using multiple methods and a variety of
and the presentation of the findings. A strong data sources to elucidate evaluation ques-
study is technically adequate and useful—in tions (e.g., M a r k and Shotland 1987). Ini-
short, it is high in quality. (GAO 1991:15-16) tial efforts at merging quantitative and
qualitative perspectives often proved diffi-
9. Evaluation professional societies cult. For example, Kidder and Fine (1987)
have supported exchanges of views and found that qualitative methods may not
high-quality professional practice in an en- triangulate easily with quantitative meth-
vironment of tolerance and eclecticism. ods because qualitative questions and de-
T h e evaluation professional societies and signs can change during the study so that
journals serve a variety of people from the two kinds of data end up addressing
different disciplines w h o operate in differ- different issues. Synthesizing qualitative
ent kinds of organizations at different lev- and quantitative data has often proved
els, in and out of the public sector, and in challenging, and when doubts have been
and out of universities. This diversity, and raised or conflicts emerged, it was often the
opportunities to exchange views and per- qualitative data that bore the larger burden
spectives, have contributed to the emergent of proof. An excellent article by M . G.
pragmatism, eclecticism, and tolerance in Trend (1978) described the difficulties of
the field. A good example is the volume of getting fair consideration of qualitative
New Directions for Program Evaluation on data in a major study.
"The Qualitative-Quantitative Debate: N e w The 1980 meetings of the Society of
Perspectives" (Reichardt and Rallis 1994a). Applied Anthropology in Denver included
T h e tone of the eight distinguished contri- a symposium on the problems encountered
butions in that volume is captured by such by anthropologists participating in teams in
phrases as "peaceful coexistence," "each which both quantitative and qualitative
tradition can learn from the other," "com- data were being collected. The problems
promise solution," "important shared char- they shared were stark evidence that quali-
acteristics," and "a call for a new partner- tative methods were typically perceived as
ship" (Datta 1994; Reichardt and Rallis exploratory and secondary when used in
1994b, 1994c; Rossi 1994; Yin 1994). conjunction with quantitative/experimen-
tal approaches. When qualitative data sup- nesses of each kind of data. Moreover,
ported quantitative findings, that was icing some intended users whom a utilization-fo-
on the cake. When qualitative data con- cused evaluator encounters may hold
flicted with quantitative data, the qualita- strong views about the value of certain
tive data were often dismissed or ignored. kinds of methods. Understanding the evo-
Despite these difficulties, there have lution of the paradigms debate in evalu-
now emerged positive examples in which ation should help evaluators work through
qualitative and quantitative data have been the biases of primary stakeholders.
used together. Fetterman (1984, 1980) has
had considerable success in reporting and Withered But Not Dead
integrating both kinds of data. He used
qualitative data to understand quantitative The trends and factors just reviewed
findings and quantitative data to broaden suggest that the paradigms debate has with-
qualitative interpretations. Maxwell, Bashook, ered substantially. The focus has shifted
and Sandlow (1985) demonstrated how an to methodological appropriateness rather
ethnographic approach can be combined than orthodoxy, methodological creativity
with an experimental design within a single- rather than rigid adherence to a paradigm,
study framework. Another area of integra- and methodological flexibility rather than
tion has emerged in evaluations that in- conformity to a narrow set of rules. How-
clude a large number of case sites in a ever, paradigm discussions have not disap-
large-scale study; Firestone and Herriott peared, and are not likely to. What has
(1984) have demonstrated how quantita- changed, I believe, is that those discussions
tive logic can contribute to the interpreta- are now primarily about philosophy rather
tion of qualitative data as the number of than methods. The connection between
sites in a study grows. The theoretical basis philosophical paradigms and methods has
for combining qualitative and quantitative been broken. Philosophical paradigm de-
methods has been well articulated (House bates concern the nature of reality (e.g.,
1994; Patton 1982a; Reichardt and Cook Bednarz 1985): Is it singular or multiple?
1979). Sechrest (1992), while attacking Is there even such a thing as truth? Is the
those few whom he perceived advocated world knowable in any absolute sense? Is
qualitative methods to the exclusion of all knowledge relative to time and place?
quantitative approaches, offered high- These are interesting and important philo-
quality examples where both had been sophical questions, but, I find, they have
integrated. little bearing on the practicalities of design-
Thus, there are positive signs that ing a useful evaluation with specific in-
evaluators have become much more sophis- tended users.
ticated about the complexities of methodo- Let's examine the pragmatic implica-
logical choices and combinations. How- tions of logically incompatible philosophi-
ever, although concrete examples of cal views of the world. Guba and Lincoln
methods integration are increasing, the evi- (1981) have argued that the scientific and
dence also suggests that integrating quali- naturalistic paradigms contain incompat-
tative and quantitative methods continues ible assumptions about the inquirer/subject
to be a difficult task requiring great sensi- relationship and the nature of truth. For
tivity and respect for the strengths of each example, the scientific paradigm assumes
approach and recognition of the weak- that reality is "singular, convergent, and
fragmentable," while the naturalistic para- information that is needed by each group:
digm holds a view of reality that is "multi- test scores? interviews? observations? The
ple, divergent, and inter-related" (Guba design and measures must be negotiated.
and Lincoln 1981:57). These opposite as- Multiple methods and multiple measures
sumptions are not about methods alterna- will give each group some of what they
tives; they are fundamental assumptions want. The naturalistic paradigm educators
about the nature of reality. An evaluator will want to be sure that test scores are
can conduct interviews and observations interpreted within a larger context of class-
under either set of assumptions, and the room activities, observations, and out-
data will stand on their own. comes. The scientific paradigm educators
I disagree, then, that philosophical as- will likely use interview and observational
sumptions necessarily require allegiance by data to explain and justify test score inter-
evaluators to one paradigm or the other. pretations. My experience suggests that
Pragmatism can overcome seemingly logi- both groups can agree on an evaluation
cal contradictions. I believe that the flex- design that includes multiple types of data
ible, responsive evaluator can shift back and and that each group will ultimately pay
forth between paradigms within a single attention to and use "the other group's
evaluation setting. In so doing, such a flex- data." In short, a particular group of people
ible and open evaluator can view the same can arrive at agreement on an evaluation
data from the perspective of each paradigm design that includes both qualitative and
and can help adherents of either paradigm quantitative data without resolving ulti-
interpret data in more than one way. mate paradigmatic issues. Such agreement
This kind of flexibility begins at the is not likely, however, if the evaluator be-
design stage. Consider the following situ- gins with the premise that the paradigms
ation. An evaluator is working with a group are incompatible and that the evaluation
of educators, some of whom are "progres- must be conducted within the framework
sive, open education" adherents and some of either one or the other.
of whom are "back-to-basics" fundamen- Perhaps an analogy will help here. A
talists. The open education group wants to sensitive, practical evaluator can work with
frame the evaluation of a particular pro- a group to design a meaningful evaluation
gram within a naturalistic framework. The that integrates concerns from both para-
basic skills people want a rigorous, scien- digms in the same way that a skillful teacher
tific approach. Must the evaluator make an can work with a group of Buddhists, Chris-
either/or choice to frame the evaluation tians, Jews, and Muslims on issues of com-
within either one or the other paradigm? mon empirical concern without resolving
Must an either/or choice be made about the which religion has the correct worldview.
kind of data to be collected? Are the views Another example: an agricultural proj-
of each group so incompatible that each ect in the Caribbean that included social
must have its own evaluation? scientists and government officials of vary-
I've been in precisely this situation a ing political persuasions. Despite their
number of times. I do not try to resolve the theoretical differences, the Marxist and
paradigms debate. Rather, I try to establish Keynesian economists and sociologists had
an environment of tolerance and respect little difficulty agreeing on what data were
for different, competing viewpoints, and needed to understand agricultural exten-
then focus the discussion on the actual sion needs in each country. Their interpre-
tations of those data also differed less than evaluation. It also demonstrated the dif-
I expected. ficulty of moving beyond narrow disci-
Thus, the point I'm making about the plinary training to make decisions based on
paradigms debate extends beyond meth- utility. It is premature to characterize the
odological issues to embrace a host of po- practice of evaluation as completely flex-
tential theoretical, philosophical, religious, ible and focused on methodological appro-
and political perspectives that can separate priateness rather than disciplinary ortho-
the participants in an evaluation process. I doxy, but it is fair to say that the goals have
am arguing that, from a practical perspec- shifted dramatically in that direction. The
tive, the evaluator need not even attempt debate over which paradigm was the right
to resolve such differences. By focusing on path to truth has been replaced, at the level
and negotiating data collection alternatives of methods, by a paradigm of choices.
in an atmosphere of respect and tolerance,
the participants can come together around
a commitment to an empirical perspective, Utilization-Focused Synthesis:
that is, bringing data to bear on important A Paradigm of Choices
program issues. As long as the empirical
commitment is there, the other differences Exhibit 12.3 summarizes the contrasting
can be negotiated in most instances. themes of the paradigms debate and de-
Debating paradigms with one's clients, scribes the synthesis that is emerging with
and taking sides in that debate, is different the shift in emphasis from methodological
from debating one's colleagues about the orthodoxy to methodological appropri-
nature of reality. I doubt that evaluators ateness and utility. Utilization-focused
will ever reach consensus on the ultimate evaluation offers a paradigm of choices.
nature of reality. But the paradigms debate Today's evaluator must be sophisticated
can go on among evaluators without para- about matching research methods to the
lyzing the practice of practical evaluators nuances of particular evaluation questions
who are trying to work responsively with and the idiosyncrasies of specific decision-
primary stakeholders to get answers to rele- maker needs. The evaluator must have a
vant empirical questions. The belief that large repertoire of research methods and
evaluators must be true to only one para- techniques available to use on a variety of
digm in any given situation underestimates problems.
the human capacity for handling ambiguity The utilization-focused evaluator works
and duality, shifting flexibly between per- with intended users to include any and all
spectives. In short, I'm suggesting that data that will help shed light on evaluation
evaluators would do better to worry about questions, given constraints of resources
understanding and being sensitive to the and time. Such an evaluator is committed
worldviews and evaluation needs of their to research designs that are relevant, rigor-
clients than to maintain allegiance to or ous, understandable, and able to produce
work within only one perspective. useful results that are valid, reliable, and
believable. The paradigm of choices recog-
Beyond Paradigm Orthodoxies nizes that different methods are appropri-
ate for different situations and purposes.
The paradigms debate helped elucidate Sometimes the paradigm of choice is a
the complexity of choices available in simple set of questions. Once, early in my
The Great
PAIR 0'DIMES DEBATE
I'M TEH CENTS

but Jo they make cents
career, working on a school evaluation, I He laughed and said. "I was an English
was asked by the superintendent what major in college. Kipling will do just fine."
evaluation model I worked from and, with- I wish I could tell you how to add luck and
out waiting for a response, he listed several a little chutzpa to your evaluation design
possibilities to let me know he had taken an kit, because both can come in quite
evaluation course in graduate school. The handy—luck always, chutzpa sometimes.
first edition of Utilization-Focused Evalu- What I can offer is the framework for a
ation (Patton 1978) had only recently been Kipling-inspired, utilization-focused para-
published and was not among the frame- digm of choices that I prepared and gave
works he offered. Nor did I recognize all to the superintendent the next day. These
the models he listed or know which he questions guided the discussions of the
preferred. The evening before I had been evaluation task force he convened for me
helping my young son memorize a poem to work with.
for school, so I said, smiling: "Actually, Who.. . ? Who is the evaluation for?
I find Kipling's model most helpful." What. . . ? What do we need to find out?
"Kipling?" he asked, bemused. I quoted: Why. . . ? Why do we want to find that
out?
/ keep six honest serving men When. . . ? When will the findings be
They taught me all I knew: needed?
Where . . . ? Where should we gather
Their names are "What and Why and "When information?
And How and Where and Who." How . . . ? How will results be used?
be known to get pro ram from where it
1 testing stakeholc
mbinatioins, dependin on what informati
ented. What need
ty, relev;ance: Aceept nee by intended u

risons most relev;
valuation questior
ative, pr actical, situa onally responsive
wers to stakehold ers questions

d issues
id progr,am's theo ryof action

• • Uj
.52 ^ CD
•o a) ~ cz O) O- CT> CO CD U> CO
>
o-u
aboration, consultati
CO CD CZ
T3
p c - ~
CO E CO
intende d users a
takeholdeir question
where it wants to
ness anid balance
;ible: Focus on co
cribing, exploring;
elopmeintal, actio
N
I ^
"O TO _iii
£
er or bo £
CO
cz
er or bo
apolatio
needed
.t= -CZ -CZ CO
(A CD
CO > —
CD o o CO
X o CO
CZ
Zz CD CO
=
o o HZ
- * J
•J=
0) ^ •- .^ ><
iysis A
O LU LU CO O Q LU O LL. LU Q
=>
<D
CO CO
•Q CO
fl>
a «/T
(0 '</>
E « Co S
ing
co •- —
cy -51 I CO
^>
•^ cz ^ CO
03 CO
•a *- g- CO CO E
UJS
for
CU
• ns = CD j o r- S sz
ynami
-o" "o mcz

^ « < - ^> <a
ente'
hyp
CD CU
CO S -^. -^ "2
4? "<5 ]§ XJ
*~ °- <o" £ E
->*
CU 'iZZ
>. ^-
w
,S> .CD *—
•> cz _c= o 05 o
It*
rat
cti
CZ CD
tio
CO
o
i- -S 6 S CO
'o CO
ist
O CU en O O
•^ ^ <J> g 1 P s-§-* ZJ
•a o o "5. Q_
CO O ^
— Ox: LU O Z co cr IE O o
CU
m f= »- O O CD
0) w• -
-^ s Q.
s^. ra
c »
's o w
E
> s-
IJ
CD
co" .CS
O
E
o
B o
LU O => -K
"II
idatin heor cal propi
nfir mat ry, h; othesis t
ant ita ediff ences or
and ferential
0>
ized riables
TO
istic
E 6 S
W O ™ £
E Q Q_ CO cz 13
55 Q -
^. —
nti literal
_Q CD CD
>
tions
roba
3 « to <2 -s
CO ^ "5 §
CD 9>
c; o .> =
o Q_
> -B —
en
o e;
T: CD CO
M
••»=
£ _>
ndom
O SO ^ "^3 CZ Q CO CO •S "S. "cO
Hilt*
scri
tan
1910
ner
=3 CO CO
co O O Q
o CO CD CD CO
LU O cr o Q CD >
CD JZ;
5 ^
E ~
CD CD CD C*- O "I
CU Q_
I! "5 °
III
-
C3
2 Q oc ^ o cr <
299
si O
I p
Deciphering Data and Reporting Results

Analysis, Interpretations, Judgments, and
Recommendations
W hat is the sound of one hand clapping?
—Hakuin
This question was first posed by the focused evaluation helps decision makers
Japanese Zen Master Hakuin (1686-1769) and intended users stand outside the pro-
as a means of facilitating enlightenment. gram and look at what is happening; evalu-
"The disciple, given a Koan [riddle] to see ations can help shake staff out of routine
through, was encouraged to put his whole ways of doing things, open up new possi-
strength into the single-minded search for bilities, and help programs realize their full
its solution, to be 'like a thirsty rat seeking potential.
for water . . . ,' to carry the problem with This allusion to Zen and the Enlight-
him everywhere, until suddenly, if he were enment of Evaluation is not frivolous.
successful, the solution came" (Hoffman Religion and philosophy are ultimately per-
1975:22). Solving a Koan is a technique sonal, perceptual, and interpretive mecha-
originated by the Zen masters to shake nisms for establishing the meaning of life;
their students out of routine ways of think- evaluation is ultimately a personal, percep-
ing and acting, open up new possibilities, tual, and interpretive approach to estab-
and help individual students realize their lishing the meaning—and meaningfulness
full potential. The evaluator is engaged in —of programs. The Zen search through
some of these same processes. Utilization- Koans consists of three basic parts: a ques-
301
tion, an answer, and interpretation/assimi- that are relevant to specific intended users.
lation of the answer in terms of the stu- In Zen, many pathways lead to enlighten-
dent's own life; evaluation involves a ques- ment; in paradigm-flexible evaluation,
tion, an empirical answer, and interpretation/ multiple methods offer divergent paths in
utilization of the answer in the context of the search for utility. Finally, the Zen stu-
the program's own dynamics. A fundamen- dent must struggle to make sense out of the
tal tenet of the Koanic method is that the answer to the Koanic riddle; in evaluation,
question is as important as the answer; the meaning of empirical data emerges
the same principle applies to utilization- from interpretation, dialogue, and situ-
focused evaluation. The Zen Master care- ational application. Consider the following
fully matches the Koan to the student; the Koanic exchange, entitled "A Flower in
responsive evaluator focuses on questions Bloom."
A 7H(mk asked Master Ummon, "What is the pure body of truth?'

Master Ummon said, "A flower in bloom."
Monk: " A flower in bloom'—what's it mean?"
Master: "Maggot in the shit hole, pus of leprosy, scab over a boil.
—Hoffm.in 19^:119
"What's it mean?" may be a philosophi- zation of intended information users.

cal, religious, or epistemological question. That's also why data analysis and interpre-
It can also be the very concrete practical tation depend on the active participation of
question of program staff laboring over primary users, because, in the end, they are
pages of statistical tables and reams of com- the ones who must translate data into deci-
puter printout. For any given set of data, sions and action.
meaning depends on who is interpreting
the data.
The truism that where some people see
Setting the Stage for Use
flowers, others see maggots is regularly and
consistently ignored in the design and in-
terpretation of evaluation studies. Evalua- Mock Data Application Scenarios
tors and decision makers can deceive them-
selves into believing that once data have The stage can be set for analysis and use
been collected, it will be clear whether or before data are ever collected. Once instru-
not the program works. But data simply do ments have been designed—but before data
not exist outside the context of a specific collection—I like to conduct a mock or
group of people with a particular perspec- simulated use session. This involves fabri-
tive. That's why utilization-focused evalu- cating possible results and interpreting the
ation begins with identification and organi- action implications of the made-up data.
Deciphering Data and Reporting Results • 303
eating possible results and interpreting the how findings might be applied before data
action implications of the made-up data. collection gets under way. The relatively
T h e evaluator prepares some possible safe, even fun, exercise of analyzing mock
"positive" results and some negative on the data can help strengthen the resolve to use
most important issues. For example, sup- before being confronted with real findings
pose primary users have chosen the job and decisions.
placement rate as the priority outcome
variable for a vocational training program. Quantitative data are fairly easy to fab-
T h e evaluator might construct data show- ricate once instruments have been devel-
ing a placement rate of 4 0 % for black oped. W i t h qualitative data, it's necessary
participants and 7 5 % for white partici- to construct imaginary quotations and
pants. The evaluator facilitates analysis by case examples. This extra w o r k can pay
asking such questions as: "What do these large dividends as decision makers de-
results mean? W h a t actions would you take velop a utilization-focused mind-set based
based on these results? H o w would you use on an actual experience struggling with
these data?" data. Athletes, performing artists, astro-
Such a discussion accomplishes four nauts, and entertainers spend h u n d r e d s of
things: hours preparing for events that take only
a few hours. Is it t o o much to ask intended
1. The simulated analysis is a check on the users to spend a couple of hours practicing
design to make sure that all the relevant data use to get mentally and analytically ready
for interpretation and use are going to be for the climax of an evaluation?
collected. All too often at the analysis stage,
evaluators and stakeholders realize that they
forgot to ask an important question. Standards of Desirability
2. The mock use session trains stakeholders for
the real analysis later. They learn how to A simulated use session also offers a
interpret data and apply results. prime opportunity to think about and for-
3. Working through a use scenario prior to data malize criteria for making judgments be-
collection helps set realistic expectations fore data collection. With quantitative
about what the results will look like. data, this can be done quite precisely by
Strengths and limitations of the design establishing standards of desirability. I like
emerge. This helps prepare users for the to have users set at least three levels of
necessity of interpreting findings in relation attainment:
to possible actions and likely ambiguities.
4. Use scenarios help build the commitment to 1. Level at which the program is considered
use—or reveal the lack of such commitment. highly effective
When intended users are unable to deal with 2. Level at which the program is considered
how they would use findings prior to data adequate
collection, a warning flag goes up that they 3. Level at which the program is considered
may be unable, or unwilling, to use findings inadequate
after data collection. The commitment to
use can be cultivated by helping intended Such standards can be established for
users think realistically and concretely about implementation targets (e.g., p r o g r a m
EXHIBIT 13.1
Intensity of Teachers' Use of a Teacher Center
Category of Visits by a Teacher Number of Visits Percentage of Total Visitors
1 or 2 185 80.4
3 or more 45 19.6
NOTE: Data are for visits between January 10 and February 28.
SOURCE: Feiman 1977:19-21.
pants have changed). Suppose one is col- judgment criteria will be up to date, realis-
lecting satisfaction data on a workshop. At tic, and meaningful.
what level of satisfaction is the workshop During the early conceptual stage of an
a success? At what level is it merely ade- evaluation, questions of use are fairly gen-
quate? At what level of participant satis- eral and responses may be vague. The
faction is the workshop to be judged inef- evaluator asks, "What would you do if you
fective? It's better to establish these kinds had an answer to your evaluation question?
of standards of desirability in a calm and How would you use evaluation findings?"
deliberative manner before actual results These general questions help focus the
are presented. This exercise, done before evaluation, but once the context has been
data collection, may also reveal that satis- delineated, the priority questions focused,
faction data alone are an inadequate indi- and methods selected, the evaluator can
cator of effectiveness while there's still pose much more specific use questions based
time to measure additional outcomes. on what the results might actually look like.
The process of specifying objectives For example, if recidivism in a community
sometimes involves setting performance corrections program is 55%, is that high
targets, for example, 75% of workshop or low? Does it mean the program was
participants will be satisfied. However, this effective or ineffective? The program had
doesn't tell us what constitutes an out- some impact, but what level of impact is
standing accomplishment; it doesn't distin- desirable? What level spells trouble?
guish adequacy from excellence. Nor does Consider evaluation of a teacher center.
it make it clear whether 65% satisfaction is One of the implementation issues concerns
inadequate or merely "lower than we the extent to which teachers use the center
hoped for but acceptable." Moreover, ob- intensively (three or more times) versus
jectives are often set a long time before the superficially (once or twice). Actual data
program is under way or well before an from such a study are shown in Exhibit
actual evaluation has been designed. Re- 13.1. Now, suppose the staff assembles to
viewing objectives and establishing precise discuss the actual results without having set
standards of desirability just before data standards of desirability or performance
collection increases the likelihood that targets.
EXHIBIT 13.2
Teacher Center Standards of Desirability
Percentage and Number of

Teachers Who Have Contact
Judgment With the Center Three or More Times
We're doing an outstanding job of engaging teachers

at this level.
We're doing an adequate job of engaging teachers

at this level.
We're doing a poor job of engaging teachers at this level.
First staff speaker: That's about what I anticipated.

Second staff speaker: Plus, remember, the data don't include teachers in our workshops
and special classes.
Third staff speaker: I think the observation time was really too short.
Fourth staff speaker: I agree. January and February are bad months, you know, everyone
is depressed with winter, and . . .
Soon it becomes apparent that either the A record-keeping system must then be
data don't tell staff much, at least not with- established, which staff agree to and believe
out other data, or that staff are not pre- in so that the data have credibility. The
pared to deal with what the data do show. teacher center staff have committed them-
Such resistance and defensiveness are not selves to actively engaging teachers on a
unusual as aspects of a postevaluation multiple-contact basis. The data will pro-
scenario. vide clear feedback about the effectiveness
Now, let's try a different scenario. At the of the program. The key point is that if staff
outset of the evaluation, the program staff are unwilling or unable to set expectancy
discuss their notions of what their task is levels before data collection, there is no
and how teacher change occurs. They de- reason to believe they can do so afterward.
cide that the kind of impact they want to In addition, going through this process
have cannot occur in one or two visits to ahead of time alerts participants to addi-
the teacher center: "If teachers don't return tional data they need in order to make sense
after one or two visits, we must be doing of and act on the results; clearly, measuring
something wrong." The period of time in frequency of visits is only a starting place.
question is a full 12-month period. Before Many of the most serious conflicts in
the data are collected, the staff fill in the evaluation are rooted in the failure to
table shown in Exhibit 13.2. clearly specify standards of desirability in
advance of data collection. This can lead found that when staff commit their guesses
both to collection of the wrong data and to to paper in advance of seeing actual results,
intense disagreement about criteria for the subsequent comparison often calls into
judging effectiveness. Without explicit cri- question just how well some staff do know
teria, data can be interpreted to mean al- what is happening in the program. At least
most anything about a program—or to with written guesses on paper, program
mean nothing at all. staff and other stakeholders can't just say,
"That's just what I expected." A database
(in the form of their guesses) exists to de-
Making Findings Interesting termine how much new has been learned.
This can be useful in documenting the ex-
Another way of setting the stage for tent to which an evaluation has provided
analysis and use is having stakeholders new insights and understandings.
speculate about results prior to seeing the You can combine establishing standards
real data. This can be done prior to data of desirability and speculating on results.
collection or after data collection but prior Give stakeholders a page with two col-
to actual presentation of findings. Stake- umns. The first column asks them to specify
holders are given an analysis table with all what outcome they consider desirable, and
the appropriate categories but no actual the second column asks them to guess what
data (a dummy table). They then fill in the result they believe will be obtained. Having
missing data with their guesses of what the specified a standard of desirability and
results will be. guessed at actual results, users have a
This kind of speculation prepares users greater stake in and a framework for look-
for how the results will be formatted and ing at the actual findings. When real results
increases interest by building a sense of are presented, the evaluator facilitates dis-
anticipation. I've even had stakeholders es- cussion on the implications of the data
tablish a betting pool on the results. Each falling below, at, or above the desired re-
person puts in a dollar or more and the sponse, and why the actual findings were
person closest to the actual results on the different from or the same as what they
major outcome wins the pot. That creates guessed. In my experience, animated inter-
interest! And the winner must be present at actions among users follow as they fully
the unveiling of the findings to win. Strange engage and interpret the results.
how attendance at the presentation of find- The amount of data presented must be
ings is increased under these conditions. highly focused and limited to major issues.
A second function of having stakehold- This is not a data-dredging exercise. Care-
ers write down their guesses is to provide a fully constructed tables and spotlighted
concrete basis for determining the extent to analysis can make such presentations lively
which actual results come close to expecta- and fruitful.
tions. Program staff, for example, some- I find that, given the time and encour-
times argue that they don't need formal agement, stakeholders with virtually no
evaluations because they know their cli- methods or statistics training can readily
ents, students, or program participants so identify the strengths, weaknesses, and im-
well that evaluation findings would just plications of the findings. The trick is to
confirm what they already know. I've move people from passive reception—
from audience status—to active involve- and in what ways the results are positive or
ment and participation (Greene 1988a). negative. What is good or bad, desirable or
undesirable, in the outcomes? Have stan-
dards of desirability been met?
A Framework for Reviewing Data
4. Recommendations: The final step (if
Four distinct processes are involved in
making sense out of evaluation findings. agreed to be undertaken) adds action to
analysis, interpretation, and judgment.
1. Description and analysis: Describing What should be done? What are the action
and analyzing findings involves organizing implications of the findings? Only recom-
raw data into a form that reveals basic mendations that follow from and are
patterns. The evaluator presents, in user- grounded in the data ought to be formu-
friendly fashion, the factual findings as related.
vealed in actual data.
Primary intended users should be ac-
2. Interpretation: What do the results tively involved in all four of these pro-
mean? What's the significance of the find- cesses so that they fully explore the find-
ings? Why did the findings turn out this ings and their implications. Facilitating
way? What are possible explanations of the these processes, especially helping stake-
results? Interpretations go beyond the data holders understand these four fundamen-
to add context, determine meaning, and tal distinctions, requires skills that go well
tease out substantive significance based on beyond what is taught in statistics courses.
deduction or inference. Working with stakeholders to analyze and
interpret findings is quite different from
3. Judgment: Values are added to analy- doing it on one's own as a researcher
sis and interpretations. Determining merit (Greene 1988a). Let's consider each of
or worth means resolving to what extent these processes in greater depth.
Arranging Data for Ease of Interpretation:

Focusing the Analysis
I hey say that figures rule the world. I do not know if this is true, but I do know
\ ^ ^ that figures tell us if it is well or poorly ruled.
—Goethe, German philosopher and author
(1749-1832), 1814
Providing descriptive statistics in a re- some reasonable format that permits deci-
port means more than simply reproducing sion makers to detect patterns. Consider
the results in relatively raw form. Data need the three presentations of data shown in
to be arranged, ordered, and organized in Exhibit 13.3. Each presents data from the
EXHIBIT 13.3
Three Presentations of the Same Data (in percentages)
Presentation 1: Raw results presented in the same order as items appeared in the survey
Expressed Needs of 478

Physically Disabled People Great Need for This Much Need Some Need Little Need
Transportation 35 36 13 16
Housing 33 38 19 10
Educational opportunities 42 28 9 21
Medical care 26 45 25 4
Employment opportunities 58 13 6 23
Public understanding 47 22 15 16
Architectural changes in buildings 33 38 10 19
Direct financial assistance 40 31 12 17
Changes in insurance regulations 29 39 16 16
Social opportunities 11 58 17 14
Presentation 2: Results combined into two categories; no priorities emerge
Great or Much Need Some or Little Need
Transportation 71 29
Housing 71 29
Educational opportunities 70 30
Medical care 71 29
Employment opportunities 71 29
Public understanding 69 31
Architectural changes in buildings 71 29
Direct financial assistance 71 29
Changes in insurance regulations 68 32
Social opportunities 69 31
Presentation 3: Utilization-focused results arranged in rank order by "great need"

to highlight priorities
Rank order Great Need for This
Employment opportunities 58
Public understanding 47
Educational opportunities 42
Direct financial assistance 40
Transportation 35
Housing 33
Architectural changes in buildings 33
Changes in insurance regulations 29
Medical care 26
Social opportunities 11
same survey items, but the focus and de- from the second presentation that the sur-
gree of complexity are different in each vey had not been useful.
case. The third presentation arranges the data
The first presentation reports items in so that decision makers can immediately
the order in which they appeared on the see respondents' priorities. Support for em-
survey, with percentages for every category ployment programs now ranks first as a
of response. It is difficult to detect patterns great need (58%) in contrast to social pro-
with 40 numbers to examine. The second grams (11%), rated lowest in priority. Users
presentation simplifies the results by divid- can go down the list and decide where to
ing the scale at the midpoint and reducing draw the line on priorities, perhaps after
the four categories to two. Sometimes, such direct financial assistance (40%). Failure to
an analysis would be very revealing, but, in arrange the data as displayed in the third
this case, no priorities emerge. Since deter- presentation places decision makers at an
mining priorities was the purpose of the analytical disadvantage. Presentation 3 is
survey, decision makers would conclude utilization focused.
Simplicity in Data Presentations
I J 1 nless one is a genius, it is best to aim at being intelligible.

—Anthony Hope,
British novelist (1863-1933)
William of Occam with his razor would . . . Economics is in retreat from political
have made an excellent analyst of evalu- reality. It's embracing mathematics and
ation data. Look first for the simplest pre- elaborate models—an enormous loss of
sentation that will handle the facts. Evalua- relevance" (pp. 65-66). His reflections re-
tors may need and use sophisticated and minded me that, when I first entered evalu-
complex statistical techniques to enhance ation, distinguished social scientists were
analytic power or uncover nuances in data, advocating more sophistication in evalu-
but simple and straightforward statistical ation designs and data analysis, for exam-
presentations are needed to give decision ple, multiple regression, path analysis, and
makers access to evaluation findings. log-linear techniques. At the same time,
Eminent economic historian Robert most decision makers I encountered at the
Heilbroner (1996) has lamented what he federal, state, and local levels were intimi-
considers the decline of economics from an dated by simple percentages, unsure of cor-
applied policy science to an abstract and relation coefficients, and wary of what they
arcane exercise in mathematical navel gaz- considered to be statistical gobbledygook.
ing. Current economics, he charged, dis- Few decision makers understand sophisti-
plays "a degree of unreality that can be cated procedures or the assumptions on
matched only by medieval scholasticism. which they rest.
I am not implying that sophisticated ways of so perfecting their public perfor-

techniques, where appropriate and help- mances that those participating will under-
ful, should not be used. I am suggesting that stand the results, though unaware of the
it is the height of folly to center one's long hours of arduous work involved in
public presentations and decision-making sifting through the data, organizing it, ar-
discussions around complex statistical find- ranging it, testing out relationships, taking
ings. I have been told by some of my col- the data apart, and creatively putting it
leagues that they make such presentations back together to arrive at that moment of
to educate public officials about statistics. public unveiling.
From my observations, I would suggest Simplicity as a virtue means that we are
that they are contributing to a sense that rewarded not for how much we confuse or
social science research is useless and con- impress, but for how much we enlighten. It
vincing policymakers that researchers can't means that we make users feel they can
communicate. master what is before them, rather than
Evaluation, if it is to be accessible to and intimidating them with our own expertise,
understandable by key stakeholders, must knowledge, and sophistication. It means
depart from the trends of the various social distinguishing the complexity of analysis
science disciplines and return to simplicity from the clarity of presentation and using
as a virtue in data presentations. Certainly, the former to inform and guide the latter.
an evaluator can use sophisticated tech- Simplicity as a virtue is not simple. It often
niques to confirm the strength and mean- involves more work and creativity to sim-
ingfulness of discovered patterns, but the plify than to rest content with a presenta-
next step is to think creatively about how tion of complex statistics as they originally
to translate those findings into simple, emerged from analysis. Simplicity as a vir-
straightforward, and understandable pre- tue is not simple, but it can be effective.
sentations. This means, for example, that
the results of a regression analysis might be
reduced to nothing more complex than a Strive for Balance
chi-square table or a set of descriptive sta-
tistics (percentages and means). This need The counterpoint to my sermonette on
not distort the presentation. Quite the con- simplicity is that evaluation findings are
trary, it will usually focus and highlight the seldom really simple. In striving for sim-
most important findings while allowing plicity, one must be careful to avoid simple-
the investigators to explain in a footnote mindedness. It is simpleminded to present
and/or an appendix that more sophisti- only one point of view. This happens most
cated techniques have been used to confirm often in evaluation when results are boiled
the simple statistics here presented. down, in the name of simplicity, to some
Simplicity as a virtue means that we seek single number—a single percentage, a sin-
clarity, not complexity. Our presentations gle cost/benefit ratio, or a single proportion
must be like the skilled acrobat who makes of the variance explained. Striving for sim-
the most dazzling moves look easy, the plicity means making the data under-
audience being unaware of the long hours standable, but balance and fairness need
of practice and the sophisticated calcula- not be sacrificed in the name of simplicity.
tions involved in what appear to be simple Achieving balance may mean that multi-
movements. Likewise, evaluators must find ple perspectives have to be represented
EXHIBIT 13.4
Illustrative Data (Constructed)
Beginning Level Four Absolute Percentage

Level Years Later Amount of Change Change
Median white income $10,100 $10,706 $606 6%

Median black income $5,500 $6,050 $550 10%
through several different numbers, all of age changes), the reader has cause to
them presented in an understandable fash- suspect that the full picture has not been
ion. Much advertising is based on the de- presented.
ception of picking the one number that puts Another example comes from a study of
a product in the best light, for example, gas Internal Revenue Service audits conducted
mileage instead of price. Politicians often by the U.S. General Accounting Office
do likewise, picking the statistic that favors (GAO 1979). The cover page of the report
their predetermined analysis. An example carried the sensational headline that IRS
may help clarify what I mean. audits in five selected districts missed $1
In his 1972 presidential campaign, million in errors in four months: "These
Richard Nixon made the claim that under districts assessed incorrect tax estimated to
his administration, black incomes had risen total $1.0 million over a 4-month period
faster than white incomes. In the same because of technical errors, computation
campaign the Democratic nominee, George errors, or failure to make automatic adjust-
McGovern, made the claim that after four ments."
years of Nixon, blacks were worse off than The IRS response to the GAO report
whites in terms of income. Both statements pointed out that the same audit cases with
were true. Each statement represented only $1 million in overlooked errors had re-
part of the picture. To understand what was vealed over $26 million in errors that led
happening in the relationship between to adjustments in tax. Thus, the $1 million
black and white incomes, one needed to represented only about 4% of the total
know, at a minimum, both absolute income amount of money involved. Moreover, the
levels and percentage changes. Consider IRS disputed the GAO's $1 million error
the data in Exhibit 13.4 to illustrate this figure because the GAO included all poten-
point. These data illustrate that black in- tial audit items whereas the IRS ignored
comes rose faster than white incomes, but differences of $100 or less. In the data
blacks were worse off than whites at the presented by the GAO, it is impossible to
end of the four-year period under study. A tell what proportion of the $1 million in-
balanced view requires both the absolute volved errors of under $100, which are
changes and the percentage changes. When routinely ignored by the IRS as not worth
a report gives only one figure or the other the costs of pursuing. Finally, a detailed
(i.e., only absolute changes or only percent- reading of the report shows that the $1
million error involves cases of two types: to misinterpretations. In workshops on

instances in which additional tax would be data analysis, I give the participants statis-
due to the IRS and instances in which a re- tics on farmers, on families, and on recidi-
fund would be due the taxpayer from the vism. In small groups, the participants in-
IRS. In point of fact, the $1 million error terpret the data. Almost invariably, they
would result in virtually no additional reve- jump right into analysis without asking
nue to the government, had all the errors how farmer was defined, how family was
been detected and followed up, because the defined, or what recidivism actually meant
two kinds of errors would cancel each other in the data at hand. A simple term like
out. farmer turns out to be enormously variant
The gross simplification of the evalu- in its use and definition. When does the
ation findings and the headlining of the weekend gardener become a farmer, and
$1 million error represent considerable dis- when does the large commercial farmer
tortion of the full picture. Simplicity at the become an agribusinessperson? A whole di-
expense of accuracy is no virtue; complex- vision of the Census Bureau wrestles with
ity in the service of accuracy is no vice. The this problem.
point is to make complex matters under- Defining family is no less complex.
standable without distortion. The omitted There was a time, not so long ago, when
information from the GAO report could Americans may have shared a common
not be justified on the basis of simplifica- definition of family. Now, there is real ques-
tion. The omissions constituted distortions tion about who has to be together under
rather than simplification. what arrangement before we call them a
Striving for balance means thinking family. Single-parent families, foster fami-
about how to present the full picture with- lies, same-sex marriages, and extended
out getting bogged down in trivia or extra- families are just a few of the possible com-
neous details. It can mean providing both plications. Before interpreting any statistics
absolute changes and percentage changes; on families it would be critical to know how
reporting the mean, median, and mode in family was defined.
order to fully represent the distribution of Measuring recidivism is common in
data; providing multiple measures of an evaluation, but the term offers a variety of
attitude or behavior; categorizing data different definitions and measures. Recidi-
more than one way to see what differences vism may mean (1) a new arrest, (2) a new
those categorical distributions make; pro- appearance in court, (3) a new conviction,
viding information about mean, range, and (4) a new sentence, (5) or actually commit-
standard deviations (represented as straight- ting a new crime regardless of whether the
forward and understandable confidence offender is apprehended. The statistics will
limits); presenting both positive and nega- vary considerably, depending on which
tive quotes from interviewees; and finding definition of recidivism is used.
ways to show the same thing in more than A magazine cartoon showed a group of
one way to increase understanding. researchers watching a television cartoon
and debating the question: "When the coy-
Be Clear About Definitions ote bounces after falling off the cliff, does
the second time he hits the ground count as
Confusion or uncertainty about what a second incidence of violence?" Of such
was actually measured or studied can lead decisions are statistics made.
DEFINITIONAL DILEMMAS
A "study" was published by the National Federation of Decency concerning the

decadent content of the Phil Donahue television talk show. One of the categories of
analysis included Donahue programs that encouraged "abnormal sex." The author of the
report later acknowledged that it was probably a bit excessive of the federation to have
included breast feeding in this category (Boulder Daily Camera, September 30, 1981:2).
But, then, definitions of abnormal sex do seem to vary somewhat. Any reader of a research
report on the subject would be well advised to look with care at the definition used by
the researcher. Of course, any savvy evaluator involved in such a study would be careful
to make sure that his or her own sexual practices were categorized as normal.
In the 1972 presidential campaign, President Nixon gained considerable press atten-
tion for making a major budget shift from defense spending to more funds for social
services. One had to listen quite attentively to learn that Veterans Administration
expenditures had simply been moved from the defense side of the ledger to the social
services side of the ledger. The statistical changes in proportion of expenditures for
different purposes were entirely an artifact of a change in categorical definition.
Such examples are not meant to make people cynical about statistics. Many distortions
of this kind are inadvertent, due to sloppiness of thinking, unexamined assumptions, or
the rush to complete a final report. Sometimes, of course, they're the result of incompe-
tence, or the old adage that "figures lie, and liars figure." Widespread skepticism about
statistics is all the more reason for evaluators to exercise care in making sure that data are
useful, accurate, and understandable. Clear definitions provide the foundation for utility,
accuracy, and understandability. A Sufi story reinforces the importance of being clear
about definitions before drawing conclusions.
The wise fool Mulla Nasrudin and a friend went to the circus together. They were
dazzled by the tightrope walker. Afterward, Nasrudin's friend kept raving about the
performance of the tightrope walker. Nasrudin tired of the conversation, but his
companion resisted all attempts to change the subject. Finally, in frustration,
Nasrudin asserted, "It wasn't really such a great feat as all that. I myself can walk a
tightrope."
Angry at Nasrudin's boasting, the friend challenged him with a substantial wager.
They set a time for the attempt in the town center so that all the villagers could be
witness. At the appointed hour Mulla Nasrudin appeared with the rope, stretched it
out on the ground, walked along it, and demanded his money.
"But the tightrope must be in the air for you to win the wager!" exclaimed the
companion.
"I wagered that I could walk a tightrope," replied Nasrudin. "As everyone can sec.
I have walked the tightrope."
The village judicial officer ruled in Nasrudin's favor. "Definitions," he explained
to the assembled villagers, "are what make laws."
They also make evaluations.

314 APPROPRIATE METHODS
MENU H.l
Menu of Program Comparisons
The outcomes of a program can be compared to

1. The outcomes of selected "similar" programs
2. The outcomes of the same program the previous year (or any other trend period,
e.g., quarterly reports)
3. The outcomes of a representative or random sample of programs in the field
4. The outcomes of special programs of interest, for example, those known to be
exemplary models or those having difficulty (purposeful sample comparison,
Patton 1990:169-86)
5. The stated goals of the program
6. Participants' goals for themselves
7. External standards of desirability as developed by the profession
8. Standards of minimum acceptability (e.g., basic licensing or accreditation
standards)
9. Ideals of program performance
10. Guesses by staff or other decision makers about what the outcomes would be
Combinations of any of these comparisons are also possible.
Make Comparisons Carefully appropriate comparison was an error of

and Appropriately zero dollars—absolute perfection in audit-
ing. The IRS considered such a standard
Virtually all analysis ends up being in unrealistic and suggested, instead, compar-
some way comparative. Numbers in isola- ing errors against the total amount of cor-
tion, standing alone without a frame of rections made in all audits.
reference or basis of comparison, seldom Skepticism can undermine evaluation
make much sense. A recidivism rate of 40% when the basis for the comparison appears
is a meaningless statistic. Is that high or arbitrary or contrived. Working with users
low? Does that represent improvement or to select appropriate comparisons involves
deterioration ? An error of $ 1 million in IRS considering a number of options. Menu
audits is a meaningless number. Some basis 13.1 presents 10 possibilities plus combina-
of comparison or standard of judgment is tions. Evaluators should work with stake-
needed in order to interpret such statistics. holders to decide which comparisons are
The challenge lies in selecting the appropri- appropriate and relevant to give a full and
ate basis of comparison. In the example of balanced view of what is happening in the
the IRS audit, the GAO believed that the program.
Consider the new jogger or running en- yet, in the course of the year, one of them,
thusiast. At the beginning, runners are without pushing the hands that are under
likely to use as a basis for comparison their him more than the other, shall have per-
previously sedentary lifestyle. By that formed infinitely more work.
standard, the initial half-mile run appears
pretty good. Then the runner discovers that "Infinitely more" appears in this in-
there are a lot of other people running, stance to be a rather crude estimate of
many of them covering 3 miles, 4 miles, 5 difference, but by the time he posed this
or 10 miles a week. Compared to seasoned hypothetical experiment, Washington had
joggers, the runner's half-mile doesn't look given up surveying and become a politi-
so good. On days when new runners want cian. An evaluator would seek somewhat
to feel particularly good, they may compare more precise measures for the compari-
themselves to all the people who don't run son, then move on to interpretations
at all. On days when they need some incen- (Why the differences?) and judgments
tive to push harder, they may compare (Are such differences good or bad?), as we
themselves to people who run twice as far shall now do.
as they do. Some adopt medical standards
for basic conditioning, something on the
order of 30 minutes of sustained and in- Interpretations and Judgments
tense exercise at least three times a week.
Some measure their progress in miles, oth- In resisting the temptation to bear alone
ers in minutes and hours. Some compare the burden of analysis and interpretation,
themselves to friends; others get involved the utilization-focused evaluator views the
in official competitions and races. All these collaborative process as a training opportu-
comparisons are valid, but each yields a nity through which users can become more
different conclusion because the basis of sophisticated about data-based decision
comparison is different in each case. making. Science fiction author and futurist
In politics, it is said that conservatives H. G. Wells (1866-1946) anticipated the
compare the present to the past and see all importance of making statistical thinking
the things that have been lost, while liberals accessible to nonstatisticians when he ob-
compare the present to what could be in served, "Statistical thinking will one day be
the future and see all the things yet to be as necessary for efficient citizenship as the
attained. Each basis of comparison pro- ability to read and write."
vides a different perspective. Fascination For evaluation users, that day is now.
with comparisons undergirds sports, poli- Incorporating a training perspective into
tics, advertising, management, and, cer- evaluation will mean being prepared to
tainly, evaluation. America's first president, help users with statistical reasoning. The
George Washington, captured this fascina- logic of qualitative analysis also needs to be
tion when he observed in the year 1791: made accessible to stakeholders.
Researchers have internalized the differ-
Take two managers and give them the same ences between analysis and interpretation,
number of laborers and let these laborers be but that distinction will need reinforcement
equal in all respects. Let both managers rise for nonresearchers. In working with stake-
equally early, go equally late to rest, be holders to understand interpretation, three
equally active, sober, and industrious, and themes deserve special attention.
1. Numbers and qualitative data must ator's responsibility to draw conclusions

be interpreted to have meaning. Numbers and render independent judgment.
are not bad or good, they're just numbers.
Interpretation means thinking about what
It is still common [for evaluators] to try to
the data mean and how they ought to be
avoid adopting any actual evaluation stance
applied. No magic formulas, not even those
although they still call what they do evalu-
for statistical significance, can infuse mean-
ation. This approach is referred to here as
ing into data. Only thinking humans can do
"pseudoevaluative investigation," and it re-
that. Interpretation is a human process, not
sults in a description masquerading as
a computer process. Statisticians have no
evaluation. It is sometimes rationalized by
corner on the ability to think and reason.
appeal to the following claim: . . . that the
The best guideline may be Nobel scientist
professional evaluator's duty is to give clients
Albert Einstein's dictum that "the impor-
the facts and let them assemble (interpret)
tant thing is to keep on questioning."
these according to their own values or to give
2. Data are imperfect indicators or rep- them the subevaluations and let them put
resentations of what the world is like. Just these together.
as a map is not the territory it describes, the The first part of this fallacy creates the
statistical tables describing a program are curious picture of the professional evaluator
not the program. That's why they have to doing everything except what is normally
be interpreted. called evaluating something. In reality, the
situation is even worse. . . . Thus, balking at
3. Statistics and qualitative data con- the last step—the overall evaluation—is like
tain varying degrees of error. Research of- deciding you want to be a virgin after the
fers probabilities, not absolutes. The switch orgy but before the Day of Judgment. [Such
from absolute judgment (things are or are an evaluator] is nearly always guilty of incon-
not) to probabilistic thinking (things are sistency as well as misleading advertising.
more or less likely) is fundamental to en- (Scriven 1991a:31)
try into empirical reasoning and careful
interpretations. In contrast to Scriven, others have ar-
gued that the evaluator's job can be lim-
Different stakeholders will bring vary- ited to supplying the data and that stake-
ing perspectives to the evaluation. Those holders alone might make the final
perspectives will affect their interpreta- judgments (e.g., Stake 1996). Utilization-
tions. The evaluator initially facilitates focused evaluation treats these opposing
elaboration of possibilities, then begins views as options to be negotiated with
the work of convergence—aiming to primary users. The evaluator's job can
reach consensus, if possible, on the most include offering interpretations, making
reasonable and useful interpretations sup- judgments, and generating recommenda-
ported by the data. Where different per- tions if, as is typical, that is what the
spectives prevail, those varying interpreta- evaluation users want. Even so, in order
tions should be reported and their to facilitate direct engagement and in-
implications explored. Judgments follow crease users' ownership, prior to offering
analysis and interpretations. my interpretations, judgments, and rec-
Scriven (1994, 1991a, 1967) has advo- ommendations, if they are requested, I
cated consistently and forcefully the evalu- first give decision makers and intended
users an opportunity to arrive at their own pointed officials in the county, as well as
conclusions unencumbered by my per- another 160 field professionals.
spective but facilitated by me. In doing so, A major purpose of the evaluation was
I find that I have to keep returning, sensi- to describe and conceptualize effective fos-
tively and diplomatically, to the distinc- ter group homes for juvenile delinquents so
tions among analysis, interpretation, that future selection of homes and training
judgment, and recommendations. of foster parents could be improved. The
While this kind of facilitation usually evaluation was also meant to provide guid-
occurs with a small number of primary ance about how to achieve better matching
users, the process can be facilitated for between juvenile offenders and foster par-
very large groups. The following example ents. We had data on how variations in
involved over 200 people in a half-day recidivism, runaway rates, and juvenile at-
process of analysis, interpretation, judg- titudes varied by different kinds of group
ment, and generating recommendations— home environments. We had measured
moving back and forth between small variations in homes with a 56-item instru-
groups and full-session reporting and ment. Factor analysis of the 56 items un-
adopting conclusions. covered a single major factor that ex-
plained 54% of the variance in recidivism,
with 19 items loading above .45 on that
factor. The critical task in data interpreta-
An Example of Utilization-Focused tion was to label that factor in such a way
Deliberations With Stakeholders that its relationship to dependent variables
would represent something meaningful to
In an evaluation of foster group homes identified information users. We focused
for juvenile offenders, we collected data the half-day work session on this issue.
from natural parents, foster parents, juve-
niles, and community corrections staff. The The session began with a brief descrip-
primary intended users, the Community tion of the methods and data, which were
Corrections Advisory Board, agreed to a then distributed. In randomly assigned
findings review process that involved a groups of four, these diverse stakeholders
large number of stakeholders from both the were asked to look at the items in Exhibit
field and policy levels. We had worked 13.5 and label the factor or theme repre-
closely with the board in problem identifi- sented by those items in their own words.
cation, research design, and instrumenta- After the groups reported their labels,
tion. Once the data were collected, we discussion followed. Consensus emerged
employed a variety of statistical techniques, around the terms participation and support
including alpha factor analysis and step- as representing one end of the continuum
wise forward regression analysis. We then and authoritarian and nonsupportive for
reduced these findings to a few pages in a the other end. We also asked the groups to
simplified form and readable format for use describe the salient elements in the factor.
at a half-day meeting with community cor- These descriptions were combined with the
rections staff, welfare department staff, labels chosen by the group. The resulting
court services staff, and members of the conceptualization—as it appeared in the
county board. That meeting included some final evaluation report—is shown in Ex-
40 of the most powerful elected and ap- hibit 13.6.
EXHIBIT 13.5
Composition of the Group Home Treatment Environment Scale
The items that follow are juvenile interview items that are highly interrelated statistically in such a way that
they can be assumed to measure the same environmental factor. The items are listed in rank order by factor
loading (from .76 to .56 for a six-factor alpha solution). This means that when the scales were combined to
create a single numerical scale, the items higher on the list received more weight in the scale (based on
factor score coefficients).
From your perspective, what underlying factor or theme is represented by the combination of these
questions? What do these different items have in common?
1. The [group home parent's names] went out of their way to help us.
almost always 30.9%
a lot of times 10.9%
just sometimes 34.5%
almost never 23.6% Factor loading = .76
2. A t . . . 's house, personal problems were openly talked about.

almost always 20.0%
a lot of times 9.1%
3. Did you feel like the group home parents tried to help you understand yourself?
almost always 23.6%
4. How often did . . . take time to encourage you in what you did?
almost always 27.3%
5. A t . . . 's house, how much were you each encouraged to make your own decisions about things?
Would you say that you were encouraged . . .
almost always 18.9%
6. How often were you given responsibility for making your own decisions?
almost always 23.6%
7. We really got along well with each other a t . . . 's.

almost always 23.6%
8. Would the group home parents tell you when you were doing well?
almost always 30.9%
9. How often were you allowed to openly criticize the group home parents?
almost always 14.8%
a lot of times 7.4%
10. How much of the time would you say there was a feeling of "togetherness" a t . . . 's?
almost always 27.3%
11. How much did they help you make plans for leaving the group home and returning to your
real home?
almost always 9.1%
12. How often would they talk with you about what you'd be doing after you left the group
home?
almost always 7.3%
13. How much of the time did the kids have a say about what went on a t . . . 's?
almost always 13.0%
14. How much were decisions about what you all had to do at the group home made only by the
adults without involving the rest of you?
almost always 30.9%
15. How much of the time were discussions a t . . . .'s aimed at helping you understand your
personal problems?
almost always 23.6%
EXHIBIT 13.6
Group Home Treatment Environment Continuum:
Description of Group Home Ideal Types
Supportive-Participatory
In group homes nearer this end of the continuum, juveniles perceive group home parents as helpful,
caring, and interested in them. Juveniles are encouraged and receive positive reinforcement.
Juveniles are involved in decisions about what goes on in the home. Kids are encouraged to make
their own decisions about the things they do personally. There is a feeling of togetherness, of being
interested in each other, of caring about what happens now and in the future. Group home parents
discuss the future with the kids and help them plan. There is a feeling of mutual support, and kids feel
that they can openly express their feelings, thoughts, problems, and concerns.
Nonsupportive-A uthoritarian
In group homes nearer this end of the continuum, juveniles report that group home parents are less
helpful, less open with them, and less interested in them personally. Juveniles are seldom encouraged
to make their own decisions, and the parents tend to make decisions without asking their opinions
about things. There isn't much planning things together or talking about the future. Kids are careful
about what they say, are guarded about expressing their thoughts and feelings. Kids get little positive
reinforcement. There is not much feeling of togetherness, support, and mutual caring; group home
parents kept things well under control.
NOTE: The descriptions presented here are based on stakeholders' interpretations of the factor analysis in Exhibit 13.5.
The groups then studied accompanying recommendations. The basic thrust of the
tables showing the relationships between discussion concerned ways to increase the
this treatment environment factor and pro- supportive-participatory experiences of ju-
gram outcome variables (see Exhibit 13.7). venile offenders. The people carrying on
The relationships were statistically signifi- that discussion were the people who fund,
cant and quite transparent. Juveniles who set policy for, operate, and control juvenile
reported experiencing more supportive- offender programs. The final written evalu-
participatory corrections environments ation report included the recommenda-
had lower recidivism rates, lower runaway tions that emerged from that meeting as
rates, and more positive attitudes. Having well as our own conclusions and recom-
established the direction of the data, we mendations as evaluators. But the final writ-
discussed the limitations of the findings, ten report took another four weeks to pre-
methodological weaknesses, and the im- pare and print; the use process was already
possibility of making firm causal infer- well under way as the meeting ended.
ences. Key decision makers were already Four main points are illustrated here
well aware of these problems. Then, given about a utilization-focused approach to
those constraints, the group was asked for findings. First, nonresearchers can under-
EXHIBIT 13.7
Relationship Between Different
Home Environments and Recidivism
No Recidivism Recidivism Total
Supportive-participatory homes 76% 24% 100%

(A/=19) (IV = 6) (A/ = 25)
Nonsupportive-authoritarian homes 44% 56% 100%

(A/= 11) (A/=14) (W=25)
NOTE: Correlation r= .33; Significant at .009 level.
stand and interpret data when presented Making Claims

with clear, readable, and simplified statisti-
cal tables. Second, as experienced data ana- One way I've found of focusing the at-
lysts know, the only way to really under- tention of primary stakeholders, especially
stand a data set is to spend some time program administrators and staff, involves
getting inside it; busy decision makers are making claims. I ask: "Having reviewed the
unwilling or unable to spend days at such a data, what can you claim about the pro-
task, but a couple of hours of structured gram?" I then ask them to list possible
time spent in facilitated analysis and inter- claims, for example, (1) participants like
pretation can pay off in greater under- the program, (2) participants get jobs as a
standing of and commitment to using re- result of the program, (3) the dropout rate
sults. Third, evaluators can learn a great is low, (4) changes in participants last over
deal from stakeholder interpretations of the long term, (5) the program is cost-
data, if they are open and listen to what effective, (6) the program does not work
people knowledgeable about the program well with people of color, and so on. Hav-
have to say. Just as decision makers do not ing generated a list of possible claims, I then
spend as much time in data analysis as do have them sort the claims into the catego-
evaluators, so evaluators do not spend as ries (or cells) shown in Exhibit 13.8. This
much time in program analysis, operations, matrix distinguishes claims by their impor-
and planning as do decision makers. Each tance and rigor. Important claims speak to
can learn from the other in the overall major issues of societal concern. Partici-
effort to make sense out of the data and pants getting and keeping jobs as a result of
provide future direction for the program. a training program is a more important
Fourth, the transition from analysis to ac- claim than that they're satisfied. Rigor con-
tion is facilitated by having key actors in- cerns the amount and quality of evidence
volved in analysis. Use does not then de- to support claims. The program might have
pend on or have to wait for a written strong evidence of participant satisfaction,
report. but weak follow-up data about job reten-
EXHIBIT 13.8
Claims Matrix
Importance of Claims
Major Minor
Strong
*
Rigor of Claims
Weak
*GOAL: Strong claims of major importance.
The most powerful, useful, and credible claims are those that are of major importance and have strong
empirical support.
Characteristics of Claims of MAJOR IMPORTANCE
Involves making a difference, having an impact, or achieving desirable outcomes

Deals with a problem of great societal concern
Affects large numbers of people
Provides a sustainable solution (claim deals with something that lasts over time)
Saves money
Saves time, that is, accomplishes something in less time than is usually the case
(an efficiency claim)
Enhances quality
Claims to be "new" or innovative
Shows that something can actually be done about a problem, that is, claims the
problem is malleable
• Involves a model or approach that could be used by others (meaning the model or
approach is clearly specified and adaptable to other situations)
Characteristics of STRONG CLAIMS
• Valid, believable evidence to support the claim

• Follow-up data over time (longer periods of follow-up provide stronger evidence than
shorter periods, and any follow-up is stronger than just end-of-program results)
• The claim is about a clear intervention (model or approach) with solid implementation
documentation
• The claim is about clearly specified outcomes and impacts:
— Behavioral outcomes are stronger than opinions, feelings, and knowledge
• The evidence for claims includes comparisons:

— To program goals
— Over time (pre-, post-, follow-up)
— With other groups
— With general trends or norms
• The evidence for claims includes replications:
— Done at more than one site
— More than one staff person attained outcomes
— Different cohort groups of participants attained comparable outcomes over time
— Different programs attained comparable results using comparable approaches
. Claims are based on more than one kind of evidence or data (i.e., triangulation of data):
— Quantitative and qualitative data
— Multiple sources (e.g., kids, parents, teachers, and staff corroborate results)
• There are clear logical and/or empirical linkages between the intervention and the claimed
outcomes
• The evaluators are independent of the staff (or where internal evaluation data are used, an
independent, credible person reviews the results and certifies the results)
• Claims are based on systematic data collection over time
CAVEAT: Importance and rigor are not absolute criteria. Different stakeholders, decision makers, and
claims makers will have different definitions of what is important and rigorous. What staff deem to be
of major importance may not be so to outside observers. What is deemed important and rigorous
changes over time and across contexts. Making public claims is a political action. Importance and
rigor are, to some extent, politically defined and dependent on the values of specific stakeholders.
Related Distinctions
1. Program premises are different from but related to and dependent on program claims.
Premises are the basic assumptions on which a program is based, for example, that effective,
attentive parenting is desirable and more likely to produce well-functioning children who become
well-functioning adults. This premise is based on research. The program cannot "prove" the premise
(though supporting research can and should be provided). The program's claims are about the
program's actual implementation and concrete outcomes, for example, that the program yielded
more effective parents who are more attentive to their children. The program does not have to follow
the children to adulthood before claims can be made.
2. Evidence is different from claims—but claims depend on evidence.
Claim: This program trains welfare recipients for jobs, places them in jobs, and, as a result,
they become self-sufficient and leave the welfare rolls.
Evidence: Numbers and types of job placements over time; pre-, post-, and follow-up data on
welfare status; participant interview data about program effects; employer interview
data about placements; and so on.
tion. The most powerful, useful, and cred- that they haven't allowed enough time to
ible claims are those of major importance really think through the possibilities and
that have strong empirical support. discuss them with people who have a stake
This framework can also be useful in the in the evaluation. I've known cases in
design phase to help primary users and key which, after working months on a project,
stakeholders focus on gathering rigorous the evaluators generated recommenda-
data about important issues so that, at the tions just hours before a final reporting
end, the evaluation will be able to report session, under enormous time pressure. In
important and strong claims. our follow-up study of federal health eval-
uations, we asked 20 decision makers
about the usefulness of the recommenda-
Useful Recommendations tions they had received. The following re-
actions provide a flavor of typical reactions
Before looking specifically at the process to recommendations:
of generating recommendations, it may
be helpful to position recommendations • I don't remember the specific recommenda-
within the overall evaluation process. tions.
Evaluations are useful in ways that go be- • The recommendations weren't anything we
yond a narrow focus on implementing rec- could do much with.
ommendations or making concrete, spe- • It was the overall process that was useful, not
cific decisions about immediate courses of the recommendations.
action. Participation in an evaluation pro- • I remember reading them, that's about all.
cess affects ways of thinking about a pro- • The recommendations looked like they'd
gram; it can clarify goals, increasing (or been added as an afterthought. Not impres-
decreasing) particular commitments; and sive.
the process can stimulate insights, the con-
sequences of which may not be evident
Useful and Practical
until some time in the distant future.
Recommendations:
(Chapter 5 discusses this kind of process
Ten Guidelines
use in depth.) Recommendations, then, do
not bear the full brunt of the hopes for Recommendations, when they are in-
evaluation use. Nevertheless, recommen- cluded in a report, draw readers' attention
dations are often the most visible part of an like bees to a flower's nectar. Many report
evaluation report. Well-written, carefully readers will turn to recommendations be-
derived recommendations and conclusions fore anything else. Some never read beyond
can be the magnet that pulls all the other the recommendations. Given their impor-
elements of an evaluation together into a tance, then, let me offer 10 guidelines for
meaningful whole. Done poorly, recom- evaluation recommendations.
mendations can become the center of at-
tack, discrediting what was otherwise a 1. After the evaluation purpose is
professional job because of hurried and clarified and before data are collected, the
sloppy work on a last-minute recommen- nature and content of the final report
dations section. I suspect that one of the should be negotiated with stakeholders and
most common reasons evaluators get into evaluation funders. Not all evaluation re-
trouble when writing recommendations is ports include recommendations. The kinds
of recommendations, if any, to be included The basic point here is that long, indis-
in a report are a matter of negotiation. criminate lists of recommendations at the
end of an evaluation report diffuse the fo-
2. Recommendations should clearly cus and diminish the power of central
follow from and be supported by the evalu- recommendations. By making explicit the
ation findings. The processes of analysis, different amounts of emphasis that the
interpretation, and judgment should lead evaluator intends to place on different rec-
logically to recommendations. ommendations, and by organizing recom-
mendations so as to differentiate among
3. Distinguish different kinds of rec-
different kinds of recommendations, the
ommendations. Recommendations that
evaluator increases the usefulness of the rec-
deal directly with central questions or is-
ommendations as well as the likelihood of the
sues should be highlighted and separated
implementation of at least some of them.
from recommendations about secondary or
minor issues. Distinctions should be made 4. Some decision makers prefer to re-
between summative and formative recom- ceive multiple options rather than recom-
mendations. It may be helpful and impor- mendations that advocate only one course
tant to distinguish between recommenda- of action. This approach may begin with a
tions that can be implemented immediately, full slate of possible recommendations: ter-
recommendations that can be implemented minate the program; reduce funding for the
in the short term (within six months to a program; maintain program funding at its
year), and recommendations aimed at the current level; increase program funding
long-term development of the program. In slightly; and increase program funding sub-
still other cases, it may be appropriate to stantially. The evaluator then lists pros and
orient recommendations toward certain cons for each of these recommendations,
groups of people: recommendations for showing which findings, assumptions, in-
funders, recommendations for program ad- terpretations, and judgments support each
ministrators, recommendations for pro- option.
gram staff, and recommendations for cli-
ents or program participants. 5. Insofar as possible, when making
Another way of differentiating types of recommendations, particularly major ones
recommendations is to clearly specify involving substantial changes in program
which recommendations are strongly sup- operations or policies, evaluators should
ported by the data and have the solid sup- study, specify, and include in their reports
port of the evaluator and/or the evaluation some consideration of the benefits and
task force versus those recommendations costs of making the suggested changes, in-
that are less directly supported by the data cluding the costs and risks of not making
or about which there is dissension among them.
members of the task force. In similar fash-
ion, it is important to distinguish between 6. Focus on actions within the control
recommendations that involve a firm belief of intended users. A major source of frus-
that some action should be taken and rec- tration for many decision makers is that the
ommendations that are meant merely to recommendations in evaluation reports re-
stimulate discussion or suggestions that late mainly to things over which they have
might become part of an agenda for future no control. For example, a school desegre-
consideration and action. gation study that focuses virtually all its
recommendations on needed changes in but, rather, that evaluators should be

housing patterns is not very useful to school astute. Controversy may or may not
officials, even though they may agree that serve the cause of getting findings used.
housing changes are needed. Is the implica- But, at the very least, controversies
tion of such a recommendation that the should be anticipated.
schools can do nothing? Is the implication
8. Be careful and deliberate in word-
that anything the school does will be lim-
ing evaluations. Important recommenda-
ited in impact to the extent that housing
tions can be lost in vague and obtuse lan-
patterns remain unchanged? Or, again, are
guage. Powerful recommendations can be
there major changes a school could make to
diluted by an overly meek style, while par-
further the aims of desegregation, but the
ticularly sensitive recommendations may be
evaluator got sidetracked on the issue of
dismissed by an overly assertive style. Avoid
housing patterns and never got back to
words that confuse or distract from the
concrete recommendations for the school?
central message.
Of course, the best way to end up with
recommendations that focus on manipu- 9. Allow time to do a good job on
lable variables is to make sure that, in con- recommendations, time to develop recom-
ceptualizing the evaluation, the focus was mendations collaboratively with stakehold-
on manipulable variables and that focus is ers, and time to pilot-test recommendations
maintained right on through to the writing for clarity, understandability, practicality,
of recommendations. utility, and meaningfulness.
7. Exercise political sensitivity in writ- 10. Develop strategies for getting rec-
ing recommendations. Ask yourself the ommendations taken seriously. Simply list-
questions, If I were in their place with their ing recommendations at the end of a report
responsibilities, their political liabilities, may mean they get token attention. Think
their personal perspectives, how would I about how to facilitate serious considera-
react to this recommendation stated in this tion of recommendations. Help decision
way? What arguments would I raise to makers make decisions on recommenda-
counter the recommendations? Work with tions, including facilitating a working ses-
stakeholders to analyze the political impli- sion that includes clear assignment of re-
cations of recommendations. This doesn't sponsibility for follow-up action and time
mean recommendations should be weak lines for implementation.
Controversy About Recommendations
A n evaluation without a recommendation is like a fish without a bicycle.
—Michael Scriven (1993:53)
While evaluators such as Mike Hen- argued that "evaluators should almost al-
dricks and Elizabeth Handley (1990) have ways offer recommendations" (p. 110),
Michael Scriven has been persistent and of which go a long way beyond the skills
vociferous in warning evaluators against necessary for evaluation. If the evaluator is
the logical fallacy of thinking that judging looking at recommendations aimed not at
the merit or worth of something leads di- actions but at improvement (for example,
rectly to recommendations. H e considers it suggested changes in the way in which the
one of the "hard-won lessons in program teacher organizes the lesson and changes in
evaluation" that evaluators seldom have the frequency of question-asking), then he or
the expertise to make recommendations she moves into an area requiring still further
and that they are generally well advised to dimensions of expertise. (Scriven 1993:53)
stop at what they are qualified to do: ren-
der judgment. Scriven (1991b) offers a n u m b e r of
analogies to make his point. A doctor may
It is widely thought that program evaluations diagnose without being able to prescribe
should always conclude with a recommenda- a cure. Just as "a roadtester is n o t a me-
tions section, but this view is based on a chanical engineer, a p r o g r a m evaluator is
misunderstanding of the logic of evaluation, not a management troubleshooter, though
and the misunderstanding has seriously un- both often suffer from delusions of gran-
fortunate effects. The conclusion of an deur in this respect" (p. 304).
evaluation is normally a statement or set of Yet, doctors routinely strive to prescribe
statements about the merit, worth, or value remedies, and a savvy mechanical engineer
of something, probably with several qualifi- would most certainly confer with a road-
cations (for example, These materials on tester before making design changes.
planetary astronomy are probably the best Scriven's vociferousness about eschewing
available, for middle-school students with recommendations follows from his asser-
well-developed vocabularies). There is a con- tion that the evaluator's primary obligation
siderable step from the conclusion to the is to render judgment. While Scriven's
recommendations (for example, You should counsel to avoid making recommendations
buy these materials for this school), and it is if one lacks expertise in remediation or
a step that evaluators are often not well- design is wise as far as it goes, he fails to
qualified to make. For example, in teacher take the added step of making it part of the
evaluation, an evaluator, or, for that matter, evaluator's responsibility to seek such ex-
a student, may be able to identify a bad pertise and facilitate experts' engagement
teacher conclusively. But it does not follow with the data. Utilization-focused evalu-
that the teacher should be fired or remediation does offer a way of taking that extra
ated or even told about the result of the step by actively involving primary intended
evaluation (which may be informal). In mak- users in the process of generating recom-
ing one of those recommendations, the mendations based on their knowledge of
evaluator must have highly specific local the situation and their shared expertise.
knowledge (for example, about the terms of Utilization-focused recommendations are
the teacher's contract, the possibility of early not the evaluator's alone; they result from
retirement, and temporary traumas in the a collaborative process that seeks and in-
teacher's home life) and special expertise corporates the very expertise Scriven says
(for example, about the legal situation), both is necessary for informed action.
A Futures Perspective on Recommendations
s how the future implications of recommendations.
Recommendations have long struck me

—Hendricks and Handley (1990:114)
of which will be a changed future. Evalua-

as the weakest part of evaluation. We have tors do so by looking at what has already
made enormous progress in ways of study- occurred; futurists do so by forecasting
ing programs, in methodological diversity, what may occur.
and in a variety of data-collection tech- In effect, at the point where evaluators
niques and designs. The payoff from those make recommendations, we become futur-
advances often culminates in recommenda- ists. Recommendations constitute a fore-
tions, but we have made comparatively less cast of what will happen if certain actions
progress in how to construct useful recom- are taken. Those forecasts are based on our
mendations. I have found that teaching analysis of what has occurred in the past.
students how to go from data to recom- The accuracy of such forecasts, as with any
mendations is often the most challenging predictions about the future, is subject to
part of teaching evaluation. It's not a sim- error due to changed conditions and the
ple, linear process. A common complaint validity of assumptions that are necessarily
of readers of evaluation reports is that they made. Futurists have developed ap-
cannot tell how the evaluators arrived at proaches for dealing with the uncertain-
their recommendations. Recommenda- ties of their forecasts. Some of these ap-
tions can become lengthy laundry lists of proaches, I think, hold promise for
undifferentiated proposals. They're alter- evaluation. For example, futurists have de-
natively broad and vague or pointed and veloped techniques for constructing alter-
controversial. But what recommendations native scenarios that permit decision mak-
always include, usually implicitly, are as- ers to consider the consequences of
sumptions about the future. different assumptions and trends. These
The field of futures studies includes a are variations on "if —»then . . . " construc-
broad range of people who use a wide tions. There are often three to four differ-
variety of techniques to make inquiries ent scenarios constructed: a pessimistic sce-
about the nature of the future. Futurists nario, an optimistic scenario, and one or
study the future in order to alter percep- two middle-of-the-road, or most likely-
tions and actions in the present. Evaluators, case scenarios.
on the other hand, study the past (what The very presentation of scenarios com-
programs have already done) in order to municates that the future is uncertain and
alter perceptions and actions in the present. that the way one best prepares for the
In this sense, then, both futurists and future is to prepare for a variety of possi-
evaluators are interested in altering percep- bilities. General Robert E. Lee is reputed to
tions and actions in the present, the impact have said, "I am often surprised, but I am
never taken by surprise." That is the es- viewed earlier illustrates this point. Major
sence of a futures perspective—to be pre- use was under way well before the report
pared for whatever occurs by having re- was written, as a result of the half-day work
flected on different possibilities, even those session devoted to analyzing the results
that are unlikely. with major stakeholders. The final report
The advantage of scenarios in evaluation was an anticlimax, and appropriately so.
presentations is threefold. First, they per- The data from our study of federal
mit us to communicate that recommenda- health evaluations revealed that much im-
tions are based on assumptions and thus, portant reporting is interpersonal and in-
should those assumptions prove unwar- formal. In hallway conversations, in rest
ranted, the recommendations may need to rooms, over coffee, before and after meet-
be altered accordingly. Second, the presen- ings, over the telephone, and through in-
tation of scenarios directs attention to formal networks, the word gets passed
those trends and factors that should be along when something useful and impor-
monitored so that as future conditions be- tant has been found. Knowing this, evalua-
come known, program actions can be al- tors can strategize about how to inject find-
tered in accordance with the way the world ings into important informal networks.
actually unfolds (rather than simply on the Formal oral briefings, presented with
basis of how we thought the world would careful preparation and skill, can have an
unfold). Third, they remind us, inherently, immediate and dramatic impact. Michael
of our limitations, for "results of a program Hendricks (1994, 1984, 1982) has studied
evaluation are so dependent on the setting effective techniques for executive summa-
that replication is only a figure of speech; ries and oral briefings: The key is good
the evaluator is essentially an historian" charts and graphics to capture attention
(Cronbach et al. 1980:7) and communicate quickly. A trend line, for
example, can be portrayed more power-
fully in graphic form than in a table, as
Utilization-Focused Reporting Exhibit 13.9 shows. I saw Mike Hendricks
at a national meeting as I was writing
this chapter. He said emphatically, "Em-
In utilization-focused evaluation, use
phasize graphics. Evaluators have got to
does not center on the final report. Tradi-
learn graphics. I'm amazed at how bad the
tionally, evaluators and users have viewed
charts and graphics are that I see in re-
the final written report as the climax—the
ports. You can't emphasize it too much.
end of the evaluation—and the key mecha-
Reporting means GRAPHICS! GRAPH-
nism for use. From an academic perspec-
ICS! GRAPHICS!"
tive, use is achieved through dissemination
of a published report. Moreover, use often
doesn't emerge as an issue until there is
something concrete (a report) to use. By Report Menu
contrast, utilization-focused evaluation is
concerned with use from the beginning, As with other stages in utilization-fo-
and a final written report is only one of cused evaluation, the reporting stage offers
many mechanisms for facilitating use. The a smorgasbord of options. Menu 13.2 dis-
Minnesota Group Home Evaluation re- plays alternatives for reporting format and
EXHIBIT 13.9
The Power of Graphics
Data in a Table
1990 43 graduates
1991 49
1992 56
1993 46
1994 85
1995 98
1996 115
1997 138
The Same Data in Graphic Form
140
120
100
80
60
40
20
1990 1991 1992 1993 1994 1995 1996 1997
style, content, contributors, and perspec- report format, and the best format is the
tives. Selecting from the menu is affected one that the fulfills the purposes of the
by the purpose of the evaluation (see Chap- evaluation and meets the needs of specific
ter 4). A summative report will highlight an intended users. In many cases, multiple
overall judgment of merit or worth with reporting strategies can be pursued to reach
supporting data. A knowledge-generating different intended users and dissemination
report aimed at policy enlightenment may audiences.
follow a traditional academic format. A
formative report may take the form of an
internal memorandum with circulation Utilization-Focused
limited to staff. I am often asked by students Reporting Principles
to show them the standard or best format
for an evaluation report. The point of I've found the following principles help-
Menu 13.2 is that there can be no standard ful in thinking about how to make report-
ing useful: (1) Be intentional about report- Menu 13.2 and we start talking reporting
ing, that is, know the purpose of a report; options.
(2) be user-focused; (3) avoid surprising Often, I find that, with this kind of
primary stakeholders; (4) think positive interaction, my primary intended users re-
about negatives; and (5) distinguish dis- ally start to understand what utilization-
semination from use. Let me elaborate each focused evaluation means. They start to
of these principles. comprehend that evaluation doesn't have
to mean producing a thick report that they
can file under "has been evaluated." They
Be Intentional and start to think about use. Caveat: Whatever
Purposeful About Reporting is agreed on, especially if there's agreement
not to produce a traditional academic
Being intentional means negotiating a monograph, get the agreement in writing
shared understanding of what it's going to and remind them of it often. A commitment
mean to close the evaluation, that is, to to alternative reporting approaches may
achieve use. Use of the evaluation findings need reinforcement, especially among
and processes is the desired outcome, not stakeholders used to traditional formats.
producing a report. A report is a means to
an end—use. You need to communicate at
every step in the evaluation your commit- Focus Reports on
ment to utility. One way to emphasize this Primary Intended Users
point during early negotiations is to ask if
a final report is expected. This question A theme running throughout this book
commands attention. "Will you want a final is that use is integrally intertwined with
report?" I ask. users. That's the thrust of the personal
They look at me and they say, "Come factor (Chapter 3). The style, format, con-
again?" tent, and process of reporting should all be
I repeat. "Will you want a final report?" geared toward intended use by intended
They respond, "Of course. That's why users. For example, we've learned, in gen-
we're doing this, to get a report." eral, that busy, big-picture policymakers
And I respond. "I see it a little differ- and funders are more likely to read concise
ently. I think we've agreed that we're doing executive summaries than full reports, but
this evaluation to get useful information to detail-oriented users want—what else?—
improve your programming and decision details. Some users prefer recommenda-
making. A final written report is one way tions right up front at the beginning of the
of communicating findings, but there's sub- report; others want them at the end; and I
stantial evidence now that it's not always had one group of users who wanted the
the most effective way. Full evaluation re- recommendations in a separate document
ports don't seem to get read much, and it's so that readers of the report had to reach
very costly to write final reports. A third or their own conclusions without interpreting
more of the budget of an evaluation can be everything in terms of recommendations.
consumed by report writing. Let's talk Methods sections may be put in the body
about how to get the evaluation used, then of the report, in an appendix, or omitted
we can see if a full written report is the most and shared only with the methodologically
cost-effective way to do that." Then I share interested. Sometimes users can't articulate
I MINI' U.l
r ^
Evaluation Reporting Menu
Style and Format Options: Written Report

Traditional academic research monograph
Executive summary followed by a full report
Executive summary followed by a few key tables, graphs, and data summaries
Executive summary only (data available to those interested)
Different reports (or formats) for different targeted users
Newsletter article for dissemination
Press release
Brochure (well crafted, professionally done)
No written report; only oral presentations
Style and Format Options: Oral and Creative
Oral briefing with charts
Short summary followed by questions (e.g., at a board meeting or legislative
hearing)
Discussion groups based on prepared handouts that focus issues for
interpretation and judgment based on data
Half-day or full-day retreat-like work session with primary intended users
Videotape or audiotape presentation
Dramatic, creative presentation (e.g., role-playing perspectives)
Involvement of select primary users in reporting and facilitating any of the above
Advocacy-adversary debate or court for and against certain conclusions and
judgments
Written and oral combinations
Content Options
Major findings only; focus on data, patterns, themes, and results
Findings and interpretations with judgments of merit or worth
(no recommendations)
(a) Summative judgment about overall program
(b) Judgments about program components
v j
what they want until they see a draft. Then Let me tell you the essence of the thing. I had
they know what they don't want and the almost no direction from the government
responsive evaluator will have to do some [about the final report] except that the proj-
rewriting. Consider this story from an ect officer kept saying, "Point 8 is really impor-
evaluator in our federal use study. tant. You've got to do point 8 on the contract."
Recommendations backed up by judgments, findings, and interpretations

(a) Single, best-option recommendations
(b) Multiple options with analysis of strengths, weaknesses, costs, and benefits
of each
(c) Options based on future scenarios with monitoring and contingency
suggestions
(d) Different recommendations for different intended users
Authors of and Contributors to the Report

Evaluator's report; evaluator as sole and independent author
Collaborative report coauthored by evaluator with others involved in the process
Report from primary users, written on their behalf by the evaluator as facilitator
and adviser, but ownership of the report residing with others
Combinations:
(a) Evaluator generates findings; collaborators generate judgments and
recommendations
(b) Evaluator generates findings and makes judgments; primary users generate
recommendations
(c) Separate conclusions, judgments, and recommendations by the evaluator
and others in the same report
Perspectives Included
Evaluator's perspective as independent and neutral judge
Primary intended users only
Effort to represent all major stakeholder perspectives (may or may not be the
same as primary intended users)
Program staff or administrators respond formally to the evaluation findings,
(written independently by the evaluator); GAO approach
Review of the evaluation by an external panel; meta-evaluation
(The Joint Committee 1994 standards prescribe that "the evaluation itself
should be formatively and summatively evaluated against [the Evaluation
Standards], so that its conduct is appropriately guided and, on completion,
stakeholders can closely examine its strengths and weaknesses," p. A12).
So, when I turned in the draft of the report. It was a detailed description of the
report, I put points 1 through 9, without 8, activities of the program that came to very
in the first part of the report. Then I essen- specific conclusions. It wasn't what had been
daily wrote another report after that just on asked for in proposal I responded to, but it
point 8 and made that the last half of the was what they needed to answer their ques-
tions. The project officer read it and the In our study of the use of federal
comment back was, "It's a good report ex- health evaluations, we asked the following
cept for all that crap in the front." question:
OK, so I turned it around in the final
version, and moved all that "crap" in the
front into an appendix. If you look at the Some suggest that the degree to which the
report, it has several big appendices. All of findings of a study were expected can affect
that, if you compare it carefully to the con- the study's impact. Arguments go both ways.
tract, all that "crap" in the appendix is what Some say that surprise findings have the
I was asked to get in the original request and greatest impact because they bring to light
contract. All the stuff that constitutes the new information and garner special atten-
body of the report was above and beyond the tion. Others say that surprises will usually be
call, but that's what he wanted and that's rejected because they don't fit in with general
what got used. [EV367:12] expectations. What's your experience and
opinion?
W e found that minor surprises on pe-

Avoid Surprising
ripheral questions created only minor
Stakeholders: Share
problems, but major surprises on central
Findings First in Draft Form
questions were unwelcome. O n e decision
maker we interviewed made the point that
T h e story just told emphasizes the im- a " g o o d " evaluation process should build
portance of sharing draft reports with pri- in feedback mechanisms to primary users
mary users in time to let them shape the that guarantee the relative predictability
final report. This doesn't mean fudging the of the content of the final report.
results to make evaluation clients happy. It
means focusing so that priority information
needs get priority. Collaborating with pri- Evaluation isn't a birthday party, so people
mary users means that evaluators cannot aren't looking for surprises. If you're coming
wait until they have a highly polished final up with data that are different than the con-
report prepared to share major findings. ventional wisdom, a good evaluation effort,
Evaluators w h o prefer to work diligently in I would suggest, would get those ideas
the solitude of their offices until they can floated during the evaluation process so that
spring a final report on a waiting world may when the final report comes out, they aren't
find that the world has passed them by. a surprise.
Formative feedback, in particular, is most Now, you could come up with findings
useful as part of a process of thinking about contrary to the conventional wisdom, but
a program rather than as a one-shot infor- you ought to be sharing those ideas with the
mation d u m p . Even in the more formal people being evaluated during the evalua-
environment of a major summative evalution process and working on acceptance. If
ation, surprises born of the public release you present a surprise, it will tend to get
of a final report are not likely to be particu- rejected. See, we don't want surprises. We
larly well received by primary stakeholders don't like surprises around here. [DM346:
caught unawares. 30-31]
The evaluator for this project ex- sight except this great evaluator? Non-
pressed the same opinion: "Good manag- sense!" [EV364:13]. Surprise attacks may
ers are rarely surprised by the findings. If make for good war strategy, but in evalu-
there's a surprising finding it should be ation, the surprise attack does little to add
rare. I mean, everybody's missed this in- credence to a study.
Tbink Positive About Negatives
John Sununu (while Governor of New Hampshire in 1988, discussing the economy and
upcoming presidential election): "You're telling us that the reason
things are so bad is that they are so good, and they will get better
as soon as they get worse?"
James A. Baker (then President Reagan's Secretary of the Treasury): "You got it."
The program staff's fear of negative re- "in view of the experience of the failure of
sults can undermine an evaluation. On the most evaluations to come up with positive
other hand, the absence of negative find- impact findings, evaluation researchers
ings can call into question the evaluator's probably would do well to encourage the
independence, integrity, and credibility. 'biasing' of evaluations in the direction of
Here, then, is where evaluation use can obtaining positive results" (p. 30). He went
take a back seat to other agendas. Staff will on to add that evaluators ought to play a
resist being made to look bad and will often more active role in helping design pro-
treat the mildest suggestions for improve- grams that have some hope of demon-
ments as deep criticisms. Evaluators, wor- strating positive impact, based on treat-
ried about accusations that they've lost ments that are highly specific and carefully
their independence, emphasize negative targeted.
findings. In the next chapter on politics and Freeman's colleague Peter Rossi, co-
ethics, we'll revisit this confrontation of author of one of the most widely used
perspectives. In this section, I want to make evaluation texts (Rossi and Freeman 1993),
two points: (1) one person's negative is shared the view that most evaluations show
another person's positive; and (2) evalua- zero impacts on targeted clients and prob-
tors can do much to increase staff receptiv- lems. He asserted, also tongue-in cheek,
ity by shifting the focus of reporting to that "only those programs likely to fail are
learning and use rather than simply being evaluated." This led him to formulate
judged as good or bad. Rossi's Plutonium Law of Evaluation:
The context for these two points is a "Program operators will explode when
general belief that most evaluations have exposed to typical evaluation research
negative findings. Howard Freeman (1977), findings" (quoted in Shadish et al. 1991:
an evaluation pioneer, expressed the opin- 386-87).
ion that the preponderance of negative On the other hand, Michael Scriven
findings diminished use. He recom- (1991b) has observed presumably the same
mended, somewhat tongue-in-cheek, that scene and concluded that evaluations dis-
play a "General Positive Bias" such that one case did any of our respondents feel
there is a "strong tendency to turn in that the positive or negative nature of
more favorable results than are justified" findings explained much about use. Be-
(p. 175). cause we encountered few summative de-
The problem I have with either stereo- cisions, the overall positive or negative
type, that most evaluations are negative or nature of the evaluation was less impor-
most are positive, is that they impose a tant than how the findings could be used
dichotomous win/lose, pass/fail, success/ to improve programs. In addition, the
failure, and positive/negative construct on positive or negative findings of a particu-
results that display considerable complex- lar study constituted only one piece of
ity. This seems born of a tendency I find information that fed into a larger process
common among evaluators and decision of deliberation and was interpreted in the
makers: to think of evaluation findings in larger context of other available informa-
monolithic, absolute, and purely summa- tion. Absolute judgments of a positive or
tive terms. In my experience, evaluation negative nature were less useful than spe-
findings are seldom either completely posi- cific, detailed statements about levels of
tive or completely negative. Furthermore, impact, the nature of relationships, and
whether findings are interpreted as positive variations in implementation and effec-
or negative depends on who is using and tiveness. This shifts the focus from
interpreting the findings. As the old adage whether findings are negative or positive
observes: Whether the glass is half empty to whether the evaluation results contain
or half full depends on whether you're useful information that can provide direc-
drinking or pouring. tion for programmatic action.
Consider these data. In our 20 federal Evaluators can shape the environment
health evaluation case studies, respondents and context in which findings are reviewed
described findings as follows: so that the focus is on learning and im-
provement rather than absolute judgment.
Basically positive findings 5 Placing emphasis on organizational learn-
Basically negative findings 2 ing, action research, participatory evalu-
Mixed positive-negative findings 7 ation, collaborative approaches, develop-
Evaluator-decision maker disagree- mental evaluation, and empowerment
ment on nature of findings 6 evaluation—approaches discussed in
Total 20 Chapter 5—can defuse fear of and resis-
tance to negative judgment.
Our sample was not random, but it was Finally, it's worth remembering, philo-
as systematic and representative of federal sophically, that the positive or negative na-
evaluations as we could make it given the ture of evaluation findings can never be
difficulty of identifying a "universe" of established with any absolute certainty. As
evaluations. Only 2 of 20 were basically Sufi wise-fool Mulla Nasrudin once ad-
negative; the most common pattern was a vised, a heavy dose of humility should ac-
mix of positive and negative; and in 6 of company judgments about what is good or
20 cases, the evaluator and primary deci- bad. Nasrudin had the opportunity to ren-
sion maker disagreed about the nature of der this caution at a teahouse. A monk
the judgment rendered. Moreover, in only entered and said:
$j"My Master taught me to spread the word that mankind will never be fulfilled until
the man who has not been wronged is as indignant about a wrong as the man who
actually has been wronged."
/ he assembly was momentarily impressed. Then Nasrudin spoke: "My Muster uuSh:
me that nobody at all should become indignant about anything until he is'sure that
what he thinks is a wrong is in fact a wrong—and not a blessing in disguised
—Sis in I "'i,4: i JS-S')
Distinguish Dissemination From Use
Dissemination of findings to audiences The problematic utility of trying to de-

beyond intended users deserves careful dis- sign an evaluation relevant to multiple
tinction from the kind of use that has been audiences, each conceptualized in vague
the focus of this book. Studies can have an and general terms, was what led to the
impact on all kinds of audiences in all kinds emphasis in utilization-focused evaluation
of ways. As a social scientist, I value and on identification and organization of pri-
want to encourage the full and free dissemi- mary intended users. Dissemination can
nation of evaluation findings. Each of us broaden and enlarge the impact of a study
ought to be permitted to indulge in the in important ways, but the nature of those
fantasy that our evaluation reports will long-term impacts is largely beyond the
have impact across the land and through control of the evaluator. What the evalu-
the years. But only a handful of studies will ator can control is the degree to which
ever enjoy (or suffer) such widespread findings address the concerns of specific
dissemination. intended users. That is the use for which I
take responsibility: intended use by in-
Dissemination efforts will vary greatly
tended users. Dissemination is not use,
from study to study. The nature of dissemi-
though it can be useful.
nation, like everything else, is a matter for
negotiation between evaluators and deci-
sion makers. In such negotiations, dissemi-
nation costs and benefits should be esti- Final Reflections
mated. The questions addressed in an
evaluation will have different meanings for Analyzing and interpreting results can be
people not directly involved in the pains- exciting processes. Many nights have
taking process of focusing the evaluation. turned into morning before evaluators have
Different individuals and audiences will be finished trying new computer runs to tease
interested in a given evaluation for reasons out the nuances in some data set. The work
not always possible to anticipate. Effective of months, sometimes years, finally comes
dissemination involves skill in extrapo- to fruition as data are analyzed and inter-
lating the evaluation specifics of a particu- preted, conclusions drawn, and alternative
lar study for use by readers in a different courses of action and recommendations
setting. considered.
This chapter has emphasized that the gram staff understood the data, from
challenges and excitement of analysis, in- whence it came, what it revealed, and how
terpretation, and judgment ought not be it could be used for program development.
the sole prerogative of evaluators. Stake- They didn't get the dissemination grant
holders can become involved in strug- that year, but they got direction about how
gling with data, too, increasing both their to implement the program more consis-
commitment to and understanding of the tently and increase its impact. Two years
findings. later, with new findings, they did win rec-
I remember fondly the final days of an ognition as a "best practices" exemplar, an
evaluation when my co-evaluators and I award that came with a dissemination
were on the phone with program staff two grant.
or three times a day as we analyzed data on Figuring out what findings mean and
an educational project to inform a major how to apply them engages us in that most
decision about whether it met criteria as a human of processes: making sense of the
valid model for federal dissemination fund-
world. Utilization-focused evaluators in-
ing. Program staff shared with us the pro-
vite users along on the whole journey, al-
cess of watching the findings take final
ternatively exciting and treacherous, from
shape. Preliminary analyses appeared nega-
determining what's worth knowing to
tive; as the sample became more complete,
interpreting the results and following
the findings looked more positive to staff;
through with action. In that spirit, Marvin
finally, a mixed picture of positive and
Alkin (1990:148) suggested a T-shirt that
negative conclusions emerged. Because the
user-oriented evaluators could give to in-
primary users had been intimately involved
tended users:
in designing the evaluation, we encoun-
tered no last-minute attacks on methods to
explain away negative findings. The pro- COME ON INTO THE DATA POOL
Realities and Practicalities of
Utilization-Focused Evaluation
In Search of Universal Evaluation Questions
A long time ago, a young evaluator set out on a quest to discover the perfect evaluation
instrument, one that would be completely valid, always reliable, and universally applicable.
His search led to Halcolm, known far and wide for his wisdom.
Young Evaluator: Great Master Halcolm, forgive this intrusion, but I am on a quest
for the perfect evaluation instrument.
Halcolm: Tell me about this perfect instrument.
Young Evaluator: I seek an instrument that is valid and reliable in all evaluation
situations, that can be used to evaluate all projects, all programs, all
impacts, all benefits, all p e o p l e . . . . I am seeking an evaluation tool
that anyone can use to evaluate anything.
Halcolm: What would be the value of such an instrument?
Young Evaluator: Free of any errors, it would rid evaluation of politics and make
evaluation truly scientific. It would save money, time, and frustra-
tion. We'd finally be able to get at the truth about programs.
Halcolm: Where would you use such an instrument?
Young Evaluator: Everywhere!
Halcolm: With whom would you use it?
Young Evaluator: Everyone!
Halcolm: What, then, would become of the process of designing situationally
specific evaluations?
Young Evaluator: Who needs it?
Halcolm: [Silence]
Young Evaluator: Just help me focus. Am I on the right path, asking the most important
questions?
Halcolm: [Silence]
Young Evaluator: What do I need to do to get an answer?
Halcolm: [Silence]
Young Evaluator: What's the use?
Halcolm: [Silence]
Young Evaluator: What methods can I use to find out what I want to know?
Halcolm: What is universal in evaluation is not a secret. Your l.isi tivc
questions reveal that you already have what you seek in the wrv
asking of those questions.
Power, Politics, and Ethics
A theory of evaluation must be as much a theory of political interaction as it

is a theory of how to determine facts.
—Lee J. Cronbach and Associates (1980:3)
Politics
During and
theEvaluation:
mid-1970s, Athe
Case Example
Kalamazoo Yet, conflict enveloped the system as
Education Association (KEA) in Michigan charges and countercharges were ex-
was locked in battle with the local school changed. The KEA, for example, charged
administration over the Kalamazoo Schools that teachers were being demoralized; the
Accountability System. The accountability superintendent responded that teachers
system consisted of 13 components, in- didn't want to be accountable. The KEA
cluding teacher and principal performance, claimed widespread teacher dissatisfaction;
fall and spring standardized testing, the superintendent countered that the hos-
teacher-constructed criterion-referenced tility to the system came largely from a
tests in high school, teacher peer evalu- vocal minority of malcontent unionists.
ations, and parent, student, and principal The newspapers hinted that the administra-
evaluations of teachers. The system had tion might be so alienating teachers that the
received considerable national attention, as system could not operate effectively. School
when The American School Board Journal board members, facing reelection, were
(in April 1974) editorialized that Kalama- nervous.
zoo Schools had designed "one of the most Ordinarily, a situation of this kind would
comprehensive computerized systems of continue to be one of charge and counter-
personnel evaluation and accountability yet charge based entirely on selective percep-
devised" (p. 40). tion, with no underlying data to clarify and
341
342 • REALITIES AND PRACTICALITIES
test the reality of the opposing positions. that "accountability as practiced in Kala-
But the KEA sought outside assistance from mazoo creates an undesirable atmosphere
Vito Perrone, Dean of the Center for of anxiety among teachers," 90% asserted
Teaching and Learning, University of that "the accountability system is mostly a
North Dakota, who had a reputation for public relations effort," and 8 3 % rated the
fairness and integrity. The KEA proposed "overall accountability system in Kalama-
that Dean Perrone conduct public hearings zoo" either "poor" or "totally inadequate."
at which interested parties could testify on The full analysis of the data, including
and be questioned about the operations teachers' open-ended comments, suggested
and consequences of the Kalamazoo Ac- that the underlying problem was a hostile
countability System. Perrone suggested that teacher-administration relationship cre-
such a public forum might become a politi- ated by the way in which the accountability
cal circus; moreover, he was concerned that system was developed (without teacher in-
a fair and representative picture of the put) and implemented (forced on teachers
system could not be developed in such an from above). The data also documented
openly polemical and adversarial forum. serious misuse of standardized tests in Kala-
He suggested instead that a survey of teach- mazoo. The school board, initially skepti-
ers be conducted to describe their experi- cal of the survey, devoted a full meeting to
ences with the accountability system and to discussion of the report.
collect a representative overview of teacher The subsequent election eroded the
opinions about their experiences. school administration's support, and the
Perrone attempted to negotiate the na- superintendent resigned. The new superin-
ture of the accountability review with the tendent and school board used the survey
superintendent of schools, but the admini- results as a basis for starting fresh with
stration refused to cooperate, arguing that teachers. A year later, the KEA officials
the survey should be postponed until after reported a new environment of teacher-
the school board election when everyone administration cooperation in developing a
could reflect more calmly on the situation. mutually acceptable accountability system.
Perrone decided to go forward, believing The evaluation report was only one of
that the issues were already politicized and many factors that came into play in Kala-
that data were needed to inform public mazoo at that time, but the results an-
debate. The evaluation was limited to pro- swered questions about the scope and na-
viding a review of the accountability pro- ture of teachers' perspectives. Candidates
gram from the perspective of teachers based for the position of superintendent called
on a mail survey conducted independently Dean Perrone to discuss the report. It be-
by the Minnesota Center for Social Re- came part of the political context within
search (which is how I became involved). which administration-teacher relations de-
The evaluation staff of the school system veloped throughout the following school
previewed the survey instrument and con- year—information that had to be taken
tributed wording changes. into account. The evaluation findings were
The results revealed intense teacher hos- used by teacher association officials to en-
tility toward and fear of the accountability hance their political position and increase
system. Of the respondents, 93% believed their input into the accountability system.
Power, Politics, and Ethics • 343
The Political Nature of Evaluation should articulate and take into account
the diversity of interests and values that
Scientists become uneasy when one may be related to the general and public
group pushes a set of findings to further its welfare" (p. 20). This principle mandates
own political purposes, as happened in a political responsibility that goes well
Kalamazoo. They much prefer that the data beyond just collecting and reporting data.
serve all parties equally in a civilized search Meeting the standard on political viabil-
for the best answer. Research and experi- ity and the principle of general responsibil-
ence suggest, however, that the Kalamazoo ity will necessitate some way of astutely
case, in which use was conditioned by po- identifying various stakeholders and their
litical considerations, is quite common. In interests. Stakeholder mapping (Bryson and
our study of how federal health evaluations Crosby 1992:377-79) can be helpful in this
were used, we found that use was affected regard. Exhibit 14.1 offers one kind of
by intra- and interagency rivalries, budget- matrix for use in mapping stakeholders
ary fights with the Office of Management according to their initial inclination toward
and Budget and Congress, power struggles the program being evaluated (support, op-
between Washington administrators and position, or neutrality) and how much they
local program personnel, and internal de- have at stake in the evaluation's outcome
bates about the purposes or accomplish- (a high stake, a moderate stake, or little
ments of pet programs. Budgetary battles stake).
seemed to be the most political, followed
by turf battles over who had the power to
act, but political considerations intruded Evaluation's Coming
in some way into every evaluation we of Age: Beyond Political
examined. Innocence and Naivete
The Program Evaluation Standards ac-
knowledge the political nature of evalu- The article most often credited with rais-
ation and offer guidance for making evalu- ing evaluators' consciousness about the
ations politically viable: politics of evaluation was Carol Weiss's
1973 analysis of "Where Politics and Evalu-
The evaluation should be planned and con- ation Research Meet." Reprinted 20 years
ducted with anticipation of the different later in Evaluation Practice, in recognition
positions of various interest groups, so that of its status as a classic, the article identified
their cooperation may be obtained, and so three major ways in which politics intrude
that possible attempts by any of these groups in evaluation: (1) Programs and policies are
to curtail evaluation operations or to bias or "creatures of political decisions" so evalu-
misapply the results can be averted or coun- ations implicitly judge those decisions;
teracted. (Joint Committee 1994:F2) (2) evaluations feed political decision mak-
ing and compete with other perspectives in
A political perspective also informs the the political process; and (3) evaluation is
Guiding Principles of the American Evalu- inherently political by its very nature be-
ation Association (AEA Task Force 1995) cause of the issues it addresses and the
as they address the "Responsibilities for conclusions it reaches. Weiss ([1973] 1993)
General and Public Welfare: Evaluators concluded:
EXHIBIT 14.1
Mapping Stakeholders' Stakes
Estimate of Various Stakeholders' Initial Inclination Toward the Program
How high are the

stakes for various
primary stakeholders? Favorable Neutral or Unknown Antagonistic
*
High
Moderate
Low
Knowing that political constraints and resis- The first step comes with being able to
tances exist is not a reason for abandoning recognize what is political.
evaluation research; rather it is a precondi- Often, in our interviews with evaluators
tion for useable evaluation research. Only about how federal health evaluations had
when the evaluator has insight into the inter- been used, we found them uneasy about
ests and motivations of other actors in the discussing the tensions between their re-
system, into the roles that he himself is search and politics; they were hesitant to
consciously or inadvertently playing, the ob- acknowledge the ways in which the evalu-
stacles and opportunities that impinge upon ation was affected by political considera-
the evaluative effort, and the limitations and tions. We found that many evaluators dis-
possibilities for putting the results to work— associated themselves from the political
only with sensitivity to the politics of side of evaluation, despite evidence
evaluation research—can the evaluator be as throughout their interviews that they were
creative and strategically useful as he should enmeshed in politics. One interviewee, a
be. (p. 94) research scientist with 12 years experience
doing federal evaluations, described pres-
Weiss showed that politics and use are sure from Congress to accelerate the evalu-
joined at the hip. In this classic analysis, ation, then added, "We really had no
she made use directly contingent on the knowledge or feeling about political rela-
political sophistication of evaluators. tionships. We are quite innocent on such
How, then, can utilization-focused eval- matters. We may not have recognized [po-
uators become politically sophisticated? litical factors]. We're researchers" [EV5:7].
In another case, the decision maker political nature of his work; this despite
stated the evaluation was never used be- the fact that the study was classified by the
cause program funding had already been funding agency as an evaluation and was
terminated before the evaluation was com- used to make policy decisions about the
pleted. When asked about this in a later processes studied. He was adamant
interview the evaluator replied, "I wasn't throughout the interview that no political
aware that the program was under any considerations or factors affected the study
serious threat. Political matters related to or its use in any way. He explained that he
the evaluation did not come up with us. It had demanded and received absolute
was not discussed to my recollection be- autonomy so that no external political
fore, during, or after the study" [EV97: pressures could be brought to bear. In his
12-13]. mind, his work exemplified nonpolitical
Part of evaluators' innocence or igno- academic research. Consider, then, re-
rance about political processes stemmed sponses he gave to other questions.
from a definition of politics that included
only happenings of momentous conse- Item: When asked how the study be-
quences. Evaluators frequently answered gan, the evaluator admitted using per-
our questions about political considera- sonal connections to get funding for the
tions only in terms of the overall climate of project:
presidential or congressional politics and
"We got in touch with some people [at the
campaigns. They didn't define the day-to-
agency] and they were rather intrigued by
day negotiations out of which programs
this. . . . It came at year's end and, as usual,
and studies evolve as politics. One evalu-
they had some funds left over... . I'm pretty
ator explained that no political considera-
certain we were not competing with other
tions affected the study because "this was
groups; they felt a sole bid kind of thing
not a global kind of issue. There were
wasn't going to get other people angry.
vested interests all right, but it was not what
[EV4:1,5-6]
would be considered a hot issue. Nobody
was going to resign over whether there was
Item: The purpose of the study?
this program or not" [EV145:12].
Failing to recognize that an issue in- We were wondering about conflict patterns
volves power and politics reduces an evalu- in citizen boards. At that time, the funding
ator's strategic options and increases the agency was concerned because many of their
likelihood that the evaluator will be used centers were in high-density ghetto areas,
unwittingly as some stakeholder's political not only cutting across the black population,
puppet. It is instructive to look at cases in but with Mexican Americans or Puerto Ri-
which the evaluators we interviewed de- cans thrown in. Up until the time of the
scribed their work as nonpolitical. Con- study, many agencies' boards were pretty
sider, for example, the responses of an middle-class. Now, you put in "poor people"
academic researcher who studied citizen and minorities—how is that going to work?
boards of community mental health pro- Is that going to disturb the system as far as
grams. At various points in the interview, the middle-class people were concerned? Of
he objected to the term evaluation and course, some of them were pretty conserva-
explained that he had conducted "basic tive, and they were afraid that we were
research," not an evaluation, thus the non- rocking the boat by looking at this. [EV4:4]
346 • REALITIES A N D PRACTICALITIES
Item: T h e study presented recom- Item: After the study, the researchers
m e n d a t i o n s about h o w citizen boards were involved in several regional a n d na-
should be organized a n d better integrated tional meetings about their findings.
into p r o g r a m s , matters of considerable
We go to an enormous number of meetings.
controversy.
And so we talked . . . and we've become
known in a limited circle as "the experts in
Item: T h e results were used to formu-
this sort of thing." [EV4:20]
late agency policy and, eventually, Con-
gressional legislation.
At one such meeting, the researcher became
We kept people talking about citizen partici- involved in an argument with local medical
pation—What does it truly mean? You see, staff.
that generated a lot of thinking. [EV4:14]
The doctors and more middle-class people in
Item: H o w did the results get dissemi- mental health work said we were just making
nated? too much of a fuss, that things were really
going along pretty well. And I remember
We released the results in a report. Now, the
distinctly in that room, which must have had
fascinating thing, like throwing a pebble in a
200 people that day, the blacks and some of
pond, [was that] Psychology Today picked up
the—you might call them militant liberals—
this report and wrote a glowing little re-
were whispering to each other and I began
view. . . ; then they made some nasty
to feel the tension and bickerings that were
comments about the cost of government re-
going on. [EV4:19]
search. [EV4:10-11]
Item: T h e researcher recounted a

Politics by Any Other Name
lengthy story about h o w a member of
nationally visible consumer advocate
The researcher w h o conducted this
Ralph N a d e r ' s staff got hold of the study,
study—a study of class and race conflict on
figured o u t the identity of local centers in
mental health citizen boards that judged
the study's sample, a n d w r o t e a separate
the effectiveness of such boards a n d in-
report. T h e researcher a n d his colleagues
cluded recommendations for improving
engaged lawyers but were unable to stop
their functioning—insisted that his work
N a d e r ' s staff from using a n d abusing their
was nonpolitical academic research, n o t an
data and sources, some of w h o m were
evaluation. Yet, he revealed, by his own
identified incorrectly.
testimony, that personal influence was used
We just didn't have the money to fight them, to get funding. T h e research question was
so we were furious. We thought that we conceived in highly value-laden terms:
would go to our lawyer friends and see if they middle-class boards versus poor people's
couldn't do something, but they all came boards. Concerns emerged about "rocking
back with pretty much the same kind of the boat." T h e study's controversial find-
negative response. What finally happened ings and recommendations were cited in
was that when [Nader's] big report came out, national publications and used in policy
using our stuff, they gave it to the New York formulation. The researchers became ex-
Times and various newspapers. [EV4:11-12] pert advocates for a certain view of citizen
participation. Personal contacts, value- 3. T h e fact that empirical data under-

laden definitions, rocking the boat, contro- gird evaluation makes it political because
versial recommendations, taking sides, de- data always require interpretation. Inter-
fending positions—of such things are pretation is only partly logical and deduc-
politics made. tive; it's also value laden and perspective
dependent.
Sources of Evaluation's 4. The fact that actions and decisions

Political Inherency follow from evaluation makes it political.
T h e political nature of evaluation stems 5. The fact that programs and organiza-
from several factors: tions are involved makes evaluation politi-
cal. Organizations allocate power, status,
1. T h e fact that people are involved in and resources. Evaluation affects those al-
evaluation means that the values, percep- location processes.
tions, and politics of everyone involved One of the weapons employed in organ-
(scientists, decision makers, funders, pro- izational conflicts is evaluative information
gram staff) impinge on the process from and judgments.
start to finish.
6. The fact that information is involved
2. T h e fact that evaluation requires in evaluation makes it political. Informa-
classifying and categorizing makes it politi- tion leads to knowledge; knowledge re-
cal. Categories inevitably filter the data duces uncertainty; reduction of uncertainty
facilitates action; and action is necessary to
collected. O n e of the more politically so-
the accumulation of power.
phisticated evaluators we interviewed de-
scribed the politics of categories:
Decision making, of course, is a euphemism
Our decision to compare urban and rural for the allocation of resources—money, po-
reflected the politics of the time—concerns sition, authority, etc. Thus, to the extent that
that city problems are different from rural information is an instrument, basis, or ex-
problems. Since this was a national program, cuse for changing power relationships within
we couldn't concentrate solely on problems or among institutions, evaluation is a politi-
in the city and not pay any attention to rural cal activity. (Cohen 1970:214)
areas. That wouldn't have been politically
smart.
And then our decision to report the per- The "Is" and the "Ought"
cent nonwhite with mental illness, that cer- of Evaluation Politics
tainly reflects attention to the whole political
and socioeconomic distribution of the popu- We have not been discussing if evalu-
lation. In that we used factors important in ation should be political. The evidence in-
the politics of the nation, to that extent we dicates that whether researchers like it or
were very much influenced by political con- not, evaluation will be influenced by politi-
siderations. We tried to reflect the political, cal factors. The degree of politicalization
social, and economic problems we thought varies, but it is never entirely absent. O n e
were important at the time. [EV12:7-8] astute decision maker we interviewed had
made his peace with the inevitability of mined as follows: The power of evaluation
politics in evaluation as follows: varies directly with the degree to which the
findings reduce the uncertainty of action
[Government decision making] is not ra- for specific stakeholders.
tional in the sense that a good scientific study This view of the relationship between
would allow you to sit down and plan every- evaluation and power is derived from the
body's life. And I'm glad it's not because I classic organizational theories of Michael
would get very tired, very early, of something Crozier (1964) and James Thompson
that ran only by the numbers. Somebody'd (1967). Crozier studied and compared a
forget part of the numbers. So, I'm not fight- French clerical agency and tobacco factory.
ing the system. But you do have to be careful He found that power relationships devel-
what you expect from a rational study when oped around uncertainties. Every group
you insert it into the system. It can have tried to limit its dependence on others and,
tremendous impact, but it's a political, not a correspondingly, enlarge its own areas of
rational process... . Life is not a very simple discretion. They did this by making their
thing. [DM328:18-19] own behavior unpredictable in relation to
other groups. Interpreting what he found,
In our interviews, evaluators tended Crozier drew on Robert DahPs (1957) defi-
to portray their findings as rational and nition of power: "The power of a person A
objective while other inputs into the over a person B is the ability of A to obtain
decision-making process were subjective that B do something he would not other-
and political. One evaluator lamented wise have done." Systems attempt to limit
that his study wasn't used because "poli- conflicts over power through rationally de-
tics outweighed our results" [EV131:8]. signed and highly routinized structures,
Such a dichotomy between evaluation and norms, and tasks. Crozier (1964) found,
politics fails to recognize the political and however, that even in a highly centralized,
power-laden nature of evaluative infor- routinized, and bureaucratic organization,
mation. it was impossible to eliminate uncertainties.
In such a context, the power of A over B

The Power of Evaluation depends on A's ability to predict B's behavior
and on the uncertainty of B about A's behav-
In this section, I want to briefly review ior. As long as the requirements of action
a theory of power that I have found instruc- create situations of uncertainty, the individu-
tive in helping me appreciate what evalu- als who have to face them have power over
ation offers stakeholders and intended us- those who are affected by the results of their
ers. Understanding this has helped me choice, (p. 158)
explain to intended users how and why
their involvement in a utilization-focused Crozier (1964) found that supervisors
evaluation is in their own best interest. It in the clerical agency had no interest in
provides a basis for understanding how passing information on to their superiors,
knowledge is power. the section chiefs. Section chiefs, in turn,
Use of evaluation will occur in direct competed with one another for attention
proportion to its power-enhancing capabil- from their superior—the division head.
ity. Power-enhancing capability is deter- Section chiefs distorted the information
they passed up to the division head to zations to increase their control over the
enhance their own positions. Section maintenance of crucial exchange relation-
chiefs could get away with distortions be- ships. Information for prediction is infor-
cause the lower-level supervisors, who mation for control: thus the power of
knew the truth, were interested in keeping evaluation.
what they knew to themselves. The divi- The Kalamazoo Schools Accountability
sion head, on the other hand, used the System case example with which this chap-
information he received to schedule pro- ter opened offers a good illustration of
duction and assign work. Knowing that he evaluation's role in reducing uncertainty
was dependent on information from oth- and, thereby, enhancing power. The ac-
ers, and not being able to fully trust that countability system was initiated, in part,
information, his decisions were carefully to control teachers. Teachers' hostility to
conservative in the sense that he aimed the system led to uncertainty concerning
only at safe, minimal levels of achieve- the superintendent's ability to manage.
ment because he knew he lacked sufficient The superintendent tried to stop the study
information to narrow risks. that would establish the degree to which
teacher opposition was widespread and
The power of prediction stems to a major crystallized. The board members let the
extent from the way information is distrib- study proceed because, as politicians, they
uted. The whole system of roles is so deplored uncertainty. Once the study con-
arranged that people are given information, firmed widespread teacher opposition,
the possibility of prediction and therefore union officials used the results to force the
control, precisely because of their position superintendent's resignation, mobilize
within the hierarchical pattern, (p. 158) public opinion, and gain influence in the
new administration. In particular, teachers
Whereas Crozier's analysis centered on won the right to participate in developing
power relationships and uncertainties be- the system that would be used to evaluate
tween individuals and among groups them. The Kalamazoo evaluation repre-
within organizations, James Thompson sents precisely the kind of political enter-
(1967) found that a similar set of concepts prise that Cohen (1970) has argued charac-
could be applied to understand relation- terizes evaluation research: "To evaluate a
ships between whole organizations. He social action program is to establish an
argued that organizations are open sys- information system in which the main
tems that need resources and materials questions involve the allocation of power,
from outside, and that "with this concep- status, and other public goods" (p. 232).
tion the central problem for complex or-
ganizations is one of coping with uncer-
tainty" (p. 13). He found that assessment Limits on Knowledge as Power
and evaluation are used by organizations
as mechanisms for reducing uncertainty A perspective contrary to the notion that
and enhancing their control over the mul- knowledge is power has been articulated by
titude of contingencies with which they L. J. Sharpe (1977). In pondering how
are faced. They evaluate themselves to social scientists came to overestimate their
assess their fitness for the future, and they potential influence on government deci-
evaluate the effectiveness of other organi- sion making, he concluded that
One important cause of this overoptimism is less information, but individual stakehold-
the widespread assumption that govern- ers will tell you that they are always open
ments are always in need of, or actively seek, to timely, relevant, and accurate informa-
information. But it seems doubtful whether tion that can reduce uncertainty and in-
this is the case. It is more likely that govern- crease their control.
ment has too much information, not too
little—too much, that is, by its own estima-
tion, (p. 44) 2. Not All People Are
Information Users
Having information, Sharpe argued, delays
and complicates government decision mak- Individuals vary in their aptitude for
ing. He cited distinguished British econo- handling uncertainty and their ability to
mist John Maynard Keynes (1883-1946) in exercise discretion. Differential socializa-
support of the proposition that information, education, and experience magnify
tion avoidance is a central feature of gov- such differences. In the political practice of
ernment: "There is nothing a government evaluation, this means that information is
hates more than to be well-informed; for it power only in the hands (minds) of people
makes the process of arriving at decisions who know to use it and are open to using
more complicated and difficult" (quoted in it. The challenge of use remains one of
Sharpe 1977:44). matching: getting the right information to
The perspectives of Keynes and Sharpe the right people.
demonstrate the necessity of limiting the One evaluator in our use study insisted
generalization that knowledge is power. on this point. Drawing on 35 years in gov-
Four qualifiers on this maxim derive ernment, 20 of those years directly in-
from the premises of utilization-focused volved in research and evaluation, and sev-
evaluation. eral years as a private evaluation contractor
on some 80 projects, he opined that good
managers are anxious to get useful infor-
Political Maxims for mation. In fact, they're hungry for it. The
Utilization-Focused Evaluators good manager "is interested in finding out
what your views are, not defending h i s . . . .
1. Not All Information Is Useful You know my sample is relatively small, but
I'd say probably there are a quarter (25%)
To be power laden, information must be of what I'd call good managers" [EV346:
relevant and in a form that is under- 15]. These, he believed, were the people
standable to users. Crozier (1964) recog- who use evaluation.
nized this qualifier in linking power to What of people who are not inclined to
reduced uncertainty: "One should be pre- use information—people who are intimi-
cise and specify relevant uncertainty. . . . dated by, indifferent to, or even hostile to
People and organizations will care only evaluation? A utilization-focused evaluator
about what they can recognize as affecting looks for opportunities and strategies for
them and, in turn, what is possibly within creating and training information users.
their control" (p. 158). Thus, the challenge of increasing use con-
Government, in the abstract, may well sists of two parts: (1) finding and involving
have too much irrelevant, trivial, and use- those who are, by inclination, information
users and (2) training those not so inclined. say that the issue had never been studied.
Just as in cards you play the hand you're Therefore, it would be a fairly standard ad-
dealt, in evaluation, you sometimes have to ministration ploy to study the issue so that it
play the stakeholders you're dealt. was not possible for somebody to insist you
It's helpful is this regard to consider the never even looked at the issue. [EV152:18]
20-50-30 rule proffered by organizational
change specialist Price Pritchett (1996). He Such a "just in case" approach to gath-
estimates that 20% of people are change- ering data wastes scarce evaluation re-
friendly; another 50% are fence-sitters sources and fills shelves with neglected
waiting to see which way the wind blows; studies. It's impossible to study every
and the remaining 30% are resisters. He possible future contingency. Utilization-
counsels wooing the fence-sitters rather focused evaluation requires a focus on
than the resisters while devoting generous real issues with real time lines aimed at
attention to supporters of the process. "You real decisions—the opposite of "just in
must be willing to let squeaky wheels case" evaluation. In that way, utilization-
squeak. Save your grease for the quieter focused evaluation aims at closing the
wheels that are carrying the load" (p. 4). gap between potential and actual use, be-
Such political calculations undergird any tween knowledge and action. Targeting
change effort, and evaluation inherently an evaluation at intended use by intended
holds the potential for change. users increases the odds of hitting the
target.
3. Information Targeted
at Use Is More Likely 4. Only Credible Information
to Hit the Target Is Ultimately Powerful
It's difficult knowing in advance of a Eleanor Chelimsky, one of the profes-

decision precisely what information will be sion's most experienced and successful
most valuable. In the battle for control over evaluators in dealing with Congress, has
uncertainty, one thing is certain—no one reiterated at every opportunity that the
wants to be caught with less information foundation of evaluation use is credibil-
than competitors for power. This fear leads ity—not just information, but credible in-
to a lot of information being collected "just formation. "Whether the issue is fairness,
in case." One evaluator we interviewed balance, methodological quality, or accu-
explained the entire function of his office racy, no effort to establish credibility is ever
in these terms: wasted. The memory of poor quality lin-
gers long" (Chelimsky 1987a:14).
I wouldn't want to be quoted by name, but Independent audits of evaluation quality
there was a real question whether we were offer one strategy for dealing with what
asked for these reports because they wanted Thomas Schwandt (1989a) called "the
them for decision making. We felt that the politics of verifying trustworthiness." The
five-foot shelf we were turning out may have Program Evaluation Standards, in calling
had no particular relevance to the real for meta-evaluation (Joint Committee
world.... But, this operation made it impos- 1994:A12; Schwandt and Halpern 1988)
sible for some Congressmen, or someone, to articulate an obligation to provide stake-
EXHIBIT 14.2
When Is Evaluation Not Political?
In 1988, my duties as President of the American Evaluation Association included posing a "Presidential
Problem" to the membership, a tradition begun by Michael Scriven. The theme of the annual national
meeting that year was The Politics of Evaluation. The problem I posed was: What is and is not politics
in evaluation, and by what criteria does one judge the difference?
The winning entry from Robin Turpin (1989) asserted that "politics has a nasty habit of sneaking
into all aspects of evaluation" (p. 55). All the other entries took essentially the same position; politics
is omnipresent in evaluation.
This anonymous entry, my personal favorite, was unequivocal.
Evaluation is NOT political under the following conditions:

• No one cares about the program.
• No one knows about the program.
• No money is at stake.
• No power or authority is at stake.
• And, no one in the program, making decisions about the program, or otherwise involved in,
knowledgeable about, or attached to the program, is sexually active.
holders with an independent assessment of counter specific political intrusions within

an evaluation's strengths and weaknesses to particular political environments. Political
guide stakeholders in judging an evalu- sophistication requires situational respon-
ation. Evaluation audits and meta-evalu- siveness. For guidance on how to anticipate
ation ensure evaluation credibility to users the possible intrusion of politics into evalu-
in the same way that independent financial ation, see Exhibit 14.2.
audits ensure the credibility of profit re-
ports to business investors. From a practical
perspective, however, not every evaluation The Political Foundations
effort merits the resources required for full of Organizing Stakeholders
meta-evaluation. I would propose the fol- Into an Evaluation Task Force
lowing practical guideline: The more poli-
ticized the context in which an evaluation Where possible and practical, an evalu-
is conducted and the more visible an evaluation task force can be organized to make
ation will be in that politicized environ- major decisions about the focus, methods,
ment, the more important to credibility will and purpose of the evaluation. The task
be an independent assessment of evalu- force is a vehicle for actively involving key
ation quality. This amounts to a form of stakeholders in the evaluation. Moreover,
matching in which safeguards of evaluation the very processes involved in making de-
credibility are designed to anticipate and cisions about an evaluation will typically
ncrease stakeholders' commitment to use sues get raised and findings get publicized
esults while also increasing their knowl- that otherwise might never see the light of
dge about evaluation, their sophistication day.
i conducting evaluations, and their ability 6. The evaluator has an opportunity to ob-
D interpret findings. The task force allows serve firsthand the interactions among vari-
le evaluator to share responsibility for ous stakeholders and assess their interper-
ecision making by providing a forum for sonal relationships. This can be very
le political and practical perspectives that helpful in developing use strategies.
est come from those stakeholders w h o 7. Momentum can be built through group
r
ill ultimately be involved in using the processes that helps reduce delays or
valuation. counter roadblocks resulting from the atti-
Several things can be accomplished with tudes or actions of one person.
group or evaluation task force that are less 8. The evaluator(s) and stakeholders in a
kely to occur with individuals, assuming group process will often jell so that it's
tat participants are willing and the group not the evaluator against the world. The
well facilitated. other stakeholders share responsibility and
ownership.
[. An environment of openness can be estab- 9. The group may continue to function after
lished to reduce suspicions and fears about the evaluation is completed. Participants
the evaluation. The key stakeholders who can develop a shared commitment to fol-
participate in the process know how deci- low through on recommendations. After
sions are made and who was involved in all, in most cases the evaluator is present
making them. This can reduce political for only a limited period. Stakeholders stay
paranoia. with the program after the evaluation is
2. Participants in the process become sensi- over. A task force can become a reposi-
tized to the multiple perspectives that exist tory for evaluation knowledge and carry
around any program. Their views are forward an appreciation of evaluation
broadened as they are exposed to the vary- processes.
ing agendas of people with different con- 10. Groups, acting in concert, have more
cerns. This increases the possibility of con- power than individuals.
ducting an evaluation that is respectful of
and responsive to different interests and Of course, all of these positive out-
values. comes of group dynamics assume an effec-
. New ideas often emerge out of the dynam- tive group process. Success depends o n :
ics of group interaction. (1) w h o participates in that process and
. A sense of shared responsibility for the (2) the questions dealt with by the g r o u p ,
evaluation can be engendered that is often that is, the focus and quality of the p r o -
greater than the responsibility that would cess. Any group rapidly becomes greater
be felt by isolated individuals. Commit- than the sum of its parts. Bringing to-
ments made in groups, in front of others, gether a group of incompetents seems to
are typically more lasting and serious than increase geometrically the capacity for in-
promises made to an evaluator in private. competent and misguided action. O n the
An open forum composed of various stake- other hand, a group of competent, politi-
holders makes it difficult to suppress cally sensitive, and thoughtful people can
touchy questions or negative findings. Is- create something that is more useful t h a n
increase stakeholders' commitment to use sues get raised and findings get publicized
results while also increasing their knowl- that otherwise might never see the light of
edge about evaluation, their sophistication day.
in conducting evaluations, and their ability 6. The evaluator has an opportunity to ob-
to interpret findings. T h e task force allows serve firsthand the interactions among vari-
the evaluator to share responsibility for ous stakeholders and assess their interper-
decision making by providing a forum for sonal relationships. This can be very
the political and practical perspectives that helpful in developing use strategies.
best come from those stakeholders w h o 7. Momentum can be built through group
will ultimately be involved in using the processes that helps reduce delays or
evaluation. counter roadblocks resulting from the atti-
Several things can be accomplished with tudes or actions of one person.
a group or evaluation task force that are less 8. The evaluator(s) and stakeholders in a
likely to occur with individuals, assuming group process will often jell so that it's
that participants are willing and the group not the evaluator against the world. The
is well facilitated. other stakeholders share responsibility and
ownership.
1. An environment of openness can be estab- 9. The group may continue to function after
lished to reduce suspicions and fears about the evaluation is completed. Participants
the evaluation. The key stakeholders who can develop a shared commitment to fol-
participate in the process know how deci- low through on recommendations. After
sions are made and who was involved in all, in most cases the evaluator is present
making them. This can reduce political for only a limited period. Stakeholders stay
paranoia. with the program after the evaluation is
2. Participants in the process become sensi- over. A task force can become a reposi-
tized to the multiple perspectives that exist tory for evaluation knowledge and carry
around any program. Their views are forward an appreciation of evaluation
broadened as they are exposed to the vary- processes.
ing agendas of people with different con- 10. Groups, acting in concert, have more
cerns. This increases the possibility of con- power than individuals.
ducting an evaluation that is respectful of
and responsive to different interests and Of course, all of these positive out-
values. comes of group dynamics assume an effec-
3. New ideas often emerge out of the dynam- tive group process. Success depends o n :
ics of group interaction. (1) w h o participates in that process and
4. A sense of shared responsibility for the (2) the questions dealt with by the g r o u p ,
evaluation can be engendered that is often that is, the focus and quality of the p r o -
greater than the responsibility that would cess. Any group rapidly becomes greater
be felt by isolated individuals. Commit- than the sum of its parts. Bringing to-
ments made in groups, in front of others, gether a group of incompetents seems to
are typically more lasting and serious than increase geometrically the capacity for in-
promises made to an evaluator in private. competent and misguided action. O n the
5. An open forum composed of various stake- other hand, a group of competent, politi-
holders makes it difficult to suppress cally sensitive, and thoughtful people can
touchy questions or negative findings. Is- create something that is more useful than
354 • REALITIES A N D PRACTICALITIES
any of t h e m individually might have cre- force is having different people show up at
ated. Shared decision making may mean different meetings. With inconsistent at-
c o m p r o m i s e ; it can also mean powerful tendance, the process never really moves
chain reactions leading to increased en- forward.
ergy and c o m m i t m e n t , especially com-
m i t m e n t to use evaluation findings in T h e composition and size of a task
which g r o u p members have increased force is limited for practical reasons. N o t
their "stake" t h r o u g h involvement in the every stakeholder can or should partici-
evaluation decision-making process. pate, though an attempt should be m a d e
to represent all major stakeholder con-
stituencies and points of view. T h e evalu-
Political Considerations ator should be fair, but practical, in work-
in Forming an ing with program administrators, funders,
Evaluation Task Force clients, program staff, and public officials
to establish a task force (and imbue it with
Several criteria are important in forming the necessary authority to make deci-
an evaluation task force. N o t all of these sions). In this regard, I find the advice of
criteria can be met to the same degree in Guba and Lincoln (1981) to be impracti-
every case, but it is helpful to have in mind cal when they assert that "it is unethical
a basic framework for the composition of for the evaluator . . . to fail to interact
the group. with any k n o w n audience in the search for
concerns and issues." They direct the
1. The members of the task force should repre- evaluator to address "the broadest pos-
sent the various groups and constituencies sible array of persons interested in or
that have an interest and stake in the evalu- affected by the evaluand [thing being eval-
ation findings and their use, including the uated, e.g., the p r o g r a m ] , including audi-
interests of program participants. ences that are unaware of the stakes they
2. The task force members should either be hold" (p. 37). While evaluators need to
people who have authority and power to use take care that the interests of p r o g r a m
evaluation findings in decision making, or to clients and the powerless are represented,
influence others who do have such power there are practical limits to identification
and authority. Again, this includes repre- and organization of decision makers and
sentatives of the program's clients, who may information users. Fairness and a healthy
be powerless as anonymous individuals but regard for pluralism are guiding lights in
whose interests can be organized and taken this regard.
into consideration for evaluation purposes.
3. The task force members should believe that
the evaluation is worth doing. Chairing the Task Force
4. The task force members should care how the
results are used. I prefer to have one of the task force
5. The task force members should be willing to participants act as chair of the group. The
make a firm commitment of time, including chair's responsibility is to convene meet-
a commitment to attend all of the evaluation ings, see that agendas for meetings are fol-
task force meetings. One of the common lowed, and keep discussions on the topic at
problems in working with an evaluation task hand. Having a stakeholder chair the task
force helps symbolize the responsibility and signs that might be used. Time considera-
authority of the group. The evaluator is a tions and intended uses are clarified so that
consultant to the group and is paid to do methods can be selected that are manage-
the nitty-gritty staff work for the evaluable, credible, and practical. Issues of valid-
ation, but the task force should assume ity, reliability, generalizability, and appro-
responsibility for the overall direction of priateness are also discussed in ways that
the process. As facilitator, trainer, and col- are understandable and meaningful.
laborator, the evaluator will command a
good deal of floor time in task force ses- 3. Design and instrument review. Be-
sions. However, an effective evaluator can tween the second and third meetings the
accomplish everything needed by working evaluator will design the instruments to be
with the chair, rather than being the chair. used and write a concrete methods pro-
posal specifying units of analysis, control or
comparison groups to be studied, sampling
Making the Process Work approaches and sample size, and the overall
data collection strategy. In reviewing the
Major stakeholders on an evaluation proposed design and instruments, the task
task force are likely to be busy people force members should understand what
whose time constraints must be respected. will be done and what will not be done,
The evaluator must be able to help focus what findings can be generated and what
the activities of the group so that time is findings cannot be generated, and what
used well, necessary decisions get made, questions will be asked and what will not
and participants do not become frustrated be asked. The third meeting will usually
with a meandering and ambiguous process. involve some changes in instrumentation
Minimize time spent on decisions about the —additions, deletions, revisions—and ad-
group process and maximize the time spent justments in the design. Basically, this meet-
on decisions about substantive evaluation ing is aimed at providing final input into
issues. What follows is a description of a the research methods before data collec-
bare-bones process. tion begins. The evaluator leaves the meet-
At a minimum, I expect to hold four ing with a clear mandate to begin data
two-hour meetings with the task force. collection.
The third meeting is also a good time to
1. Focus/conceptualization session. The do a mock use exercise in which task force
first meeting should establish the focus of members consider specifically how various
the evaluation. The group considers alter- kinds of findings might be used, given simu-
native questions, issues, problems, and lated results. If we get these answers to this
goals to decide the purpose and direction question, what would that mean? What
of evaluation. would we do with those results?
2. Methods and measurement options. 4. Data interpretation session. The

The second meeting considers different fourth and final meeting in this minimum
ways of conducting the evaluation, given scenario focuses on data analysis, interpre-
the focus determined in the first meeting. tation, judgment, and recommendations.
The evaluator presents varying kinds of The evaluator will have arranged the data
measurement approaches and different de-
so that the task force members can under- 1. In working with stakeholders, seek
stand and interpret the results. to negotiate win/win scenarios. For exam-
ple, in an environment of controversy, with
strong program advocates versus strong
Variations on program adversaries, the evaluation ques-
a Political Theme tion "Is the program effective?" frames a
win/lose scenario. That is, the very way the
The bare-bones scenario of four focused question is posed—dichotomously—frames
meetings with primary intended users illus- a win/lose scenario. In contrast, a strengths
trates the minimum commitment (eight and weaknesses framing of the question
hours) one needs from busy stakeholders. focuses on learning and improvement
Large-scale, complex evaluations with rather than a good versus bad judgment:
many stakeholders may involve more face- "For what kinds of participants under what
to-face sessions and different kinds of in- conditions is the program most effective?
teractions. For example, in Chapter 12, I And for whom is the program less effec-
described a data interpretation session that tive?" Identifying strengths and acknow-
involved some 200 criminal justice system ledging weaknesses is a win/win outcome.
stakeholders. Thus, the four-session out- Forcing a judgment of effective/ineffective
line above is not a recipe, but rather a is a win/lose outcome.
beginning framework for thinking politi-
cally and instrumentally about how to
2. Help primary stakeholders avoid
make the stakeholder involvement process
getting their egos attached to how the
meaningful and practical.
evaluation turns out. When someone's ego
or sense of esteem is at risk, the political
stakes skyrocket. Emphasize the value of
Political Rules in learning, regardless of what the results show,
Support of Use rather than being right or wrong. Help
users derive ego strength from being astute
The degree to which an evaluation be- information users rather than whether
comes caught up in destructive power poli- their a priori position is affirmed.
tics can be mitigated by savvy evaluators.
By recognizing the inherently political na- 3. Help users develop a long-term view
ture of evaluation, evaluators can enter the of learning, improvement, and knowledge
political fray as power players in a game use. Short-term "negative" results are less
where the rules are subject to manipula- threatening when placed in a longer term
tion. The evaluator then works to nego- context of ongoing development. If a short-
tiate rules in the power game that favor term result becomes associated in a user's
informed and intended use by intended mind with ultimate success or failure, the
users. Here are some rules of the power stakes skyrocket, and the power games be-
game that I find consistent with utilization- come potentially nastier.
focused evaluation. These rules have been
influenced by Power: The Infinite Game 4. Create an environment of inter-
(Broom and Klein 1995) and adapted to fit pretation that values diverse perspectives.
evaluation. Everyone doesn't have to reach the same
conclusion. Dialogue, discussion, and re- 8. Keep in mind that what happens in a
spect for differences enhance enlighten- particular process aimed at making deci-
ment. A focus on truth frames the results in sions for a specific evaluation has implica-
a way that someone is right and someone tions, not only for that evaluation, but for
else is wrong: again, win/lose instead of future evaluations. Each evaluation process
win/win. becomes part of one's evaluation legacy.
Think long-term. The worst political
5. Seek to affirm and reaffirm that ev- abuses often occur in the name of short-
eryone is interested in what works best for term gain. Stay on the high road.
intended beneficiaries. Head off tendencies
to lose sight of this high purpose when
stakeholders are tempted to focus on Fears of Political Co-optation
power games such as who gains and loses
resources or who gets credit or blame. I encounter a lot of concern that in
Those issues are real and will need to be facilitating utilization-focused evaluation,
understood and negotiated, but within the the evaluator may become co-opted by
context of a commitment to effectiveness. stakeholders. How can evaluators maintain
People do, in fact, respond to noble pur- their integrity if they become involved in
poses—or can be forced to take such pur- close, collaborative relationships with
poses into account even when pursuing stakeholders? How does the evaluator take
their own selfish interests. politics into account without becoming a
political tool of only one partisan interest?
6. Avoid getting entangled in group The nature of the relationship between
process rules, such as like Robert's Rules, evaluators and the people with whom they
or stifling voting procedures. Seek consen- work is a complex and controversial one.
sus and shared ownership. Voting can lead On the one hand, evaluators are urged to
to winners and losers. Consensus is inclu- maintain a respectful distance from the
sive of everyone. Of course, this isn't al- people they study to safeguard objectivity
ways possible. With large groups and can- and minimize personal and political bias.
tankerous stakeholders, formal process On the other hand, the human relations
rules and voting may become necessary, but perspective emphasizes that close, interper-
I think it's worth striving for the ideal of sonal contact is a necessary condition for
operating by consensus. building mutual understanding. Evaluators
thus find themselves on the proverbial
7. Diverge, then converge. Generate al- horns of a dilemma: Getting too close to
ternatives, then focus. Get diverse points of decision makers may jeopardize scientific
view, then prioritize. Keep before the group credibility; remaining distant may under-
that many possibilities exist. There's no mine use.
single best approach or design. Utility and A program auditor at a workshop put
practicality are the order of the day, not the issue less delicately when he asked,
rigid adherence to a preconceived notion "How can we get in bed with decision
of the best model. That's why this book has makers without losing our virginity?"
presented menus for the major decisions This is a fascinating and revealing meta-
intended users must make. phor, showing just how high the stakes can
seem. The evaluator is portrayed as the integrity of an evaluation group process

innocent, the policymaker as the co-opting depends on helping participants adopt an
tempter planning a seduction. I once re- empirical perspective. A commitment must
ported this metaphor to a group of policy- be engendered to find out what is really
makers who immediately reframed the happening, at least as nearly as one can,
question: How do we get in bed with given the limitations of research methods
evaluators without getting sadistically and scarce resources. Engendering such
abused?" Different stakes, different fears. commitment involves teaching and facili-
tating.
When stakeholders first begin discussing
Maintaining an the purpose of an evaluation, they will
Empirical Focus often do so in nonempirical terms. "We
want to prove the program's effectiveness."
One way to handle concerns about co- Proving effectiveness is a public relations
optation is to stay focused on evaluation's job, not an evaluation task. This statement
empirical foundation. In Chapter 2, I dis- tells the evaluator about that person's atti-
cussed the importance of and ways to en- tude toward the program, and it indicates
gender a commitment to reality testing a need for diplomatically, sensitively, but
among intended users. The empirical basis determinedly, reorienting that stakeholder
of evaluation involves making assumptions from a concern with public relations to a
and values explicit, testing the validity of concern with learning about and docu-
assumptions and carefully examining a menting actual program activities and ef-
program to find out what is actually occur- fects. The evaluator need not be frightened
ring. The integrity of an evaluation de- by such public relations statements. It's best
pends on its empirical orientation—that is, to get such inclinations out in the open.
its commitment to systematic and credible Then the work begins of moving toward an
data collection and reporting. Likewise, the empirical orientation.
Program Director: We want to prove the program's effectiveness.

Evaluator: What kind of information would do that?
Program Director: Information about how much people like the program.
Evaluator: Does everyone like the program?
Program Director: I think most everyone does.
Evaluator: Well, we could find out just how many do and how many don't. So
there's a reasonable evaluation question: "What are participants'
attitudes toward the program?" Later we'll need to get more specific
about how to measure their attitudes, but first let's consider some
other things we could find out. Assuming that some people don't
like the program, what could be learned from them?
Program Director: I suppose we could find out what they don't like and why.
Evaluator: Would that kind of information be helpful in looking at the pro-
gram, to find out about its strengths and weaknesses so that perhaps
you could improve it in some ways? [This is a deliberately leading
question, very hard to say "No" to.]
Program Director: Well, we know some of the reasons, but we can always learn more.
Evaluator: What other information would be helpful in studying the program
to find out about its strengths and weaknesses? [Here the evaluator
has carefully rephrased the original concern from "proving the
program's effectiveness" to "finding out about the program's
strengths and weaknesses."]
In this dialogue, the evaluator chips ineffective. In such cases, the evaluator
away at the program director's biased can emphasize what can be learned by
public relations perspective by carefully finding out about the program's strengths.
helping an empirical perspective emerge. Few programs are complete disasters. An
At some point the evaluator may want empirical approach means gathering data
to, or need to, address the public rela- on actual program activities and effects
tions concern with a bit of a speech (or and then presenting those data in a fair
sermonette). and balanced way so that information us-
ers and decision makers can make their
I know you're concerned about proving the own judgments about goodness or bad-
program's effectiveness. This is a natural ness, effectiveness or ineffectiveness. Such
concern. A major and common purpose of judgments, however, are separate from
evaluation is to gather information so that the data. In my experience, evaluation
judgments can be made about the value of a task force members will readily move into
program. To what extent is it effective? To this kind of empirical orientation as they
what extent is it worthwhile? come to understand its utility and fairness.
If we only gathered and presented posi- It's the evaluator's job to help them adopt
tive information, it would lack credibility. If that perspective.
you read a report that only says good things I don't want to imply that shifting to an
about a program, you figure something's empirical orientation occurs easily or as the
been left out. In my experience, an evalu- result of a single interaction. Quite the
ation has more credibility if it's balanced. No contrary, the empirical orientation of
program is perfect. I've yet to see a program evaluation requires ongoing reinforce-
in which everyone was happy and all goals ment. Some stakeholders never make the
were achieved. You may find that it's more shift. Others do so enthusiastically. The
politically astute to study and report both savvy evaluator will monitor the empirical
strengths and weaknesses, and then show orientation of intended users and, in an
that you're serious about improving the proactive-reactive-adaptive mode of situ-
gram by presenting a strategy for dealing ational responsiveness, take appropriate
with areas of ineffectiveness. By so doing, steps to keep the evaluation on an empirical
you establish your credibility as serious pro- and useful path.
gram developers who can deal openly and
effectively with inevitable difficulties.
Evaluation Misuse
Sometimes the opposite bias is the
problem. Someone is determined to kill a Evaluation processes and findings can be
program, to present only negative find- misused in the search for political advan-
ings and to "prove" that the program is tage. Alkin and Coyle (1988) have made a
critical distinction between misevaluation, needed to capture the complexities of real-

in which an evaluator performs poorly or world practice. One dimension is a contin-
fails to adhere to standards and principles, uum from non-use to use. A second is a
and misuse, in which users manipulate the continuum from non-use to misuse. (See
evaluation in ways that distort the findings Exhibit 14.3.) Studying or avoiding misuse
or corrupt the inquiry. is quite different from studying or facilitat-
The profession has become increasingly ing use.
concerned about problems of misuse
(Stevens and Dial 1994), whether the 2. Having conceptualized two separate
source be politics (Palumbo 1994), asking dimensions, it is possible to explore the
of the wrong questions (Posavac 1994), relationship between them. Therefore, per-
pressures on internal evaluators (Duffy mit me the following proposition: As use
1994; Mowbray 1994), petty self-interest increases, misuse will also increase. (See
(Dial 1994), or ideology (Vroom, Co- Exhibit 14.3.) It seems to me that when
lumbo, and Nahan 1994). One emergent people ignore evaluations, they ignore
theme of these inquiries is that misuse, like their potential uses as well as abuses. As we
use, is ultimately situational. Consider, for successfully focus greater attention on
example, an illustrative case from Alkin evaluation data, and as we increase actual
and Coyle (1988). use, we can also expect there to be a corre-
sponding increase in abuse, often within
An administrator blatantly squashes several the same evaluation experience. Donald T.
negative evaluation reports to prevent the Campbell (1988) made a similar prediction
results from reaching the general public. On in formulating "a discouraging law that
the surface, such an action appears to be a seems to be emerging: the more any social
prime case of misutilization. Now, consider indicator is used for social decision making,
the same action (i.e., suppressing negative the greater the corruption pressures upon it"
findings) in a situation where the reports (p. 306; emphasis in original).
were invalid due to poor data collection....
Thus, misutilization in one situation may be 3. Misuse can be either intentional or
conceived of as appropriate non-use in an- unintentional. Unintentional misuse can be
other, (p. 3) corrected through the processes aimed at
increasing appropriate and proper use. In-
King (1982) has argued, I think reasonably, tentional misuse is an entirely different
that intentional non-use of poorly con- matter, which invites active intervention to
ducted studies should be viewed as appro- correct whatever has been abused, either
priate and responsible. What complicates the evaluation process or findings. As with
this position is the different perspectives on most problems, correcting misuse is more
what constitutes a quality evaluation, as, expensive than preventing it in the first
for example, in the debate about methods place.
reviewed in Chapter 12.
I would share the following thoughts 4. Working with multiple users who un-
about misuse: derstand and value an evaluation is one of
the best preventatives against misuse. Allies
1. Misuse is not at the opposite end of in use are allies against misuse. Indeed, I
a continuum from use. Two dimensions are work to have intended users take so much
EXHIBIT 14.3
Relation of Use to Misuse
Use
Misuse
Nonuse
Hypothesis: As use increases, misuse will also increase.
ownership of the evaluation that they become the champions of appropriate use, the
guardians against misuse, and the defenders of the evaluation's credibility when misuse
occurs.
Policing misuse is sometimes beyond the evaluator's control, but what is always
squarely within an evaluator's domain of direct responsibility and accountability is
misevaluation: failures of conduct by the evaluator, which brings us to evaluation ethics.
Ethics of Being User-Focused
J t is truly unethical to leave ethics out of program evaluation.
—Michael Scriven (1993:30)
T: elling the truth to people who may not want to hear it is, after all, the chief purpose
of evaluation.
—Eleanor Chelimsky (1995b:54)
Concern that utilization-focused evalua- nately, there are many persons who call
tors may be co-opted by stakeholders, or themselves evaluators who would be glad to
become pawns in service of their political sell such service. (Stufflebeam 1994:325)
agendas, raises questions beyond how to be
politically astute, strategic, and savvy, or The Program Evaluation Standards
how to prevent misuse. Underneath, deci- provide general ethical guidance: Evalu-
sions about one's relationships with in- ation agreements should be in writing;
tended users involve ethics. Speaking truth rights of human subjects should be pro-
to power is risky—risky business. Not only tected; evaluators should respect human
is power involved, but money is involved. dignity; assessments should be complete
The Golden Rule of consulting is, "Know and fair; findings should be openly and
who has the gold." Evaluators work for fully disclosed; conflicts of interest should
paying clients. The jobs of internal evalua- be dealt with openly and honestly; and
tors depend on the pleasure of superiors, sound fiscal procedures should be fol-
and future work for independent evalua- lowed. The Propriety Standards "are in-
tors depends on client satisfaction. Thus, tended to ensure that an evaluation will
there's always the fear that "they who pay be conducted legally, ethically, and with
the piper call the tune," meaning not just due regard for the welfare of those in-
determining the focus of the evaluation, volved in the evaluation, as well as those
but also prescribing the results. Evaluators affected by the results" (Joint Committee
can find themselves in conflict between 1994:P1-P8). Likewise, the "Guiding
their professional commitment to honest Principles" of the American Evaluation
reporting and their personal interest in Association (1995) insist on integrity and
monetary gain or having future work. This honesty throughout the entire evaluation
conflict is so pervasive that Scriven process, from initial negotiations with cli-
(1991b) believes evaluation suffers from ents and stakeholders through reporting.
"General Positive Bias—a tendency to turn Newman and Brown (1996) have gener-
in more favorable results than are justified" ated a framework for making ethical deci-
(p. 174). sions in evaluation: (1) pay attention to
The quote below, also included at the one's intuition that something isn't quite
end of Chapter 5, was aimed at empower- right; (2) look for rules that provide guid-
ment evaluation, but the same concern is ance; (3) examine how the situation looks
often expressed about any process of evalu- in terms of basic ethical principles: auton-
ation that involves close, responsive rela- omy (rights involved), nonmaleficence (do-
tionships between evaluators and clients. ing no harm), beneficence (doing good),
justice (fairness), and fidelity (adhering to
Anyone who has been in the evaluation busi- agreements); (4) examine your personal
ness for very long knows that many potential values—be in touch with your own beliefs
clients are willing to pay much money for a and comfort levels; and (5) act, which can
"good, empowering evaluation," one that include consulting with colleagues, calcu-
conveys the particular message, positive or lating trade-offs, and making and following
negative, that the client/interest group hopes a plan. They provide case problems, evalu-
to present, irrespective of the data, or one ation examples, and ethical challenges that
that promotes constructive, ongoing, and illuminate how the framework can be ap-
nonthreatening group process. . . . Unfortu- plied in real situations.
Ethical Concerns Specific to tended users and (2) working closely with
Utilization-Focused Evaluation those users. The ethics of limiting and fo-
cusing stakeholder involvement concern
The Program Evaluation Standards, the who has access to the power of evaluation
AEA Guiding Principles, the Newman/ knowledge. The ethics of building close
Brown Framework for Making Ethical De- relationships concerns the integrity, neu-
cisions, and a General Accounting Office trality, and corruptibility of the evaluator.
(GAO 1996) report on the need for "con- Both of these concerns center on the fun-
tinued vigilance" to protect human subjects damental ethical question: Who does an
provide general ethical guidance and make evaluation —and an evaluator—serve?
it clear that evaluators encounter all kinds Consider the following exchange I had
of situations that require a strong ground- with Carol Weiss, who was arguing that
ing in ethics and may demand courage. findings must stand on their own rather
Beyond general ethical sensitivity, how- than depend on interpersonal relation-
ever, the ethics of utilization-focused ships, and Ernest House, who believes that
evaluators are most likely to be called into evaluators are ethically obligated to con-
question around two essential aspects of sider the interests of the poorly educated or
utilization-focused evaluation: (1) limiting less powerful in society who are not in a
stakeholder involvement to primary in- position to advocate on their own behalf.
Carol Weiss: I think we limit ourselves too much if we think of interpersonal

interaction as the critical component in utilization.
Michael Patton: From my perspective, I feel a great responsibility to serve my clients.
Ernest House: How far would you pursue this orientation? Surely you can't
consider your only purpose to be meeting your client's interests?
Michael Patton: Tell me why I can't?
Ernest House: Why? It's an immoral position.
Michael Patton: I could argue. . . .
Ernest House: You can't. You can't. It would be a long argument which you'd lose.
Michael Patton: Let's go for it.
Ernest House: Who has the money to purchase evaluation? The most proper
people in society. You would serve only the most proper people in
society? You wouldn't condone that. . . . A doctor can't be con-
cerned only with a particular patient and not concerned with the
distribution of his or her services across society as a whole. . . .
Medicine only for the richest? Surely you can't condone that kind
of position. . . .
Michael Patton: What I am talking about is my responsibility to the specific set of
people I work with, who will be different from case to case. What
I take immediate responsibility for is what they do and the things
that I do with them. I recognize that there's a broader set of things
that are going to be of concern, but I can't take responsibility for
all of what happens with that broader set of concerns.
Ernest House: Well, you must.

Michael Patton: But I can't.
Ernest House: Then you are immoral. Right? You'll back off that position. You
cannot justify that position. You don't hold that position. I mean,
you say it, but you can't hold i t . . . . You have to have a concern that
you're taking beyond the immediate welfare of the immediate
client. I believe that you do that.
Michael Patton: I think I build that larger concern you're talking about into my
interactions with that client.... There is a moral concern. There is
a moral and value context that I bring to bear in that interaction
with my clients.
Ernest House: You have to show concern for the rest of society. You can't just sell
your services to whoever can purchase them. That would be an
immoral position.... I say that you should be concerned about the
interests of the less advantaged people in society. (Excerpted and
edited from a transcription by Alkin 1990:101-105)
I've reproduced a portion of our discus- reputation, credibility, and beliefs are on
sion to offer a taste of its intensity. As the the line. A utilization-focused evaluator is
dialogue unfolded, three things were illu- not passive in simply accepting and buying
minated for me with regard to utilization- into whatever an intended user initially
focused evaluation: (1) Evaluators need to desires. The active-reactive-adaptive pro-
be deliberative and intentional about their cess connotes an obligation on the part of
own moral groundings; (2) evaluators the evaluator to represent the standards
must exercise care, including ethical care, and principles of the profession as well as
in selecting projects to work on and stake- his or her own sense of morality and integ-
holders to work with; and (3) evaluators rity, while also attending to and respecting
must be clear about whose interests are the beliefs and concerns of other primary
more and less represented in an evalu- users.
ation. Let me elaborate these points. A second important point reinforced by
First, evaluators need to be deliberative the debate was the importance of project
and intentional about their own moral and stakeholder selection. At one point in
groundings. An evaluator, such as Ernest the debate, Ross Connor, a former presi-
House, will and should bring moral con- dent of the American Evaluation Associa-
cerns about social justice into negotiations tion, asked me, "You pick and choose cli-
over the design of an evaluation, including ents, right?" I affirmed that I did. "My
concerns about whose interests are repre- concern," he replied, "would be those who
sented in the questions asked and who will don't have the luxury of picking and choos-
have access to the findings. The active part ing who they work with" (Quoted in Alkin
of being active-reactive-adaptive is bring- 1990:104). One way in which I take into
ing your own concerns, issues, and values account the importance of the personal
to the table. The evaluator is also a stake- factor is by careful attention to whom I
holder—not the primary stakeholder— work with. Whether one has much choice
but, in every evaluation, an evaluator's in that, or not, it will affect the way in
which ethical issues are addressed, espe- lution is to work to get participants in
cially what kinds of ethical issues are likely affected groups representing themselves as
to be of concern. In challenging what he has part of the evaluation negotiating process.
called "clientism"—"the claim that what- As discussed in Chapter 3, user-focused
ever the client wants . . . is ethically cor- evaluation involves real people, not just
rect," House (1995) asked: "What if the attention to vague, abstract audiences.
client is Adolph Eichmann, and he wants Thus, where the interests of disadvantaged
the evaluator to increase the efficiency of people are at stake, ways of hearing from
his concentration camps?" (p. 29). or involving them directly should be ex-
A third issue concerns how the interests plored, rather than having them repre-
of various stakeholder groups are represented in a potentially patronizing manner
sented in a utilization-focused process. De- by the advantaged. Whether and how to do
spite House's admonitions, I'm reluctant, this may be part of what the evaluator
as a white, middle-class male, to pretend to attends to during active-reactive-adaptive
represent the interests of people of color or interactions.
society's disadvantaged. My preferred so-
Guarding Against Corruption of an Evaluation
thics is not something for a special occasion; it is a part of daily practice.

—Newman and Brown (1996:187)
While House has raised concerns about For Scriven, evaluators don't serve spe-
how working with a selective group of cific people. They serve truth. Truth may
intended users can serve the powerful and be a victim when evaluators form close
hurt the interests of the poor and less pow- working relationships with program staff.
erful, a different concern about utilization- Scriven (1991b: 182) admonishes evalua-
focused evaluation is raised by Michael tors to guard their independence scrupu-
Scriven when he worries about undermin- lously. Involving intended users would only
ing what he considers evaluation's central risk weakening the hard-hitting judgments
purpose—rendering independent judg- the evaluator must render. Evaluators, he
ments about merit or worth. If evaluators has observed, must be able to deal with the
take on roles beyond judging merit or loneliness that may accompany inde-
worth, such as creating learning organiza- pendence and guard against "going na-
tions or empowering participants, or, al- tive," the tendency to be co-opted by and
ternatively, eschew rendering judgment in become an advocate for the program being
order to facilitate judgments by intended evaluated. Going native leads to "incestu-
users, the opportunities for ethical slip- ous relations" in which the "evaluator is 'in
page become so pervasive as to be over- bed' with the program being evaluated"
whelming. (p. 192). Scriven (1991a) has condemned
any failure to render independent judg- that it is welcomed for long-term effective-
ment as "the abrogation of the professional ness. Dedicated program staff don't want
responsibility of the evaluator" (p. 32). He to waste their time doing things that don't
has derided what he mockingly called "a work.
kinder, gentler approach" to evaluation I have followed in the tracks of, and
(p. 39). His concerns stem from what he cleaned up the messes left by, evaluators
has experienced as the resistance of evalu- who took pride in their courageous, hard-
ation clients to negative findings and the hitting, negative feedback. They patted
difficulty evaluators have—psychologi- themselves on the back for their virtues and
cally—providing negative feedback. Thus, went away complaining about program re-
he has admonished evaluators to be uncom- sistance and hostility. I watched them in
promising in reporting negative results. action. They were arrogant, insensitive,
"The main reason that evaluators avoid and utterly unskilled in facilitating feed-
negative conclusions is that they haven't back as a learning experience. They con-
the courage for it" (p. 42). gratulated themselves on their inde-
My experience has been different from pendence of judgment and commitment to
Scriven's, so I reach different conclusions. "telling it like it is" and ignored their
Operating selectively, as I acknowledged largely alienating and useless practices.
earlier, I choose to work with clients who They were closed to feedback about the
are hungry for quality information to im- ineffectiveness of their feedback.
prove programs. They are people of great It's from these kinds of experiences that
competence and integrity who are able to I have developed a preference for construc-
use and balance both positive and negative tive and utilization-focused feedback. In
information to make informed decisions. I any form of feedback, it's hard to hear the
take it as part of my responsibility to work substance when the tone is highly judg-
with them in ways that they can hear the mental and demeaning. This applies to in-
results, both positive and negative, and use teractions between parents and children (in
them for intended purposes. I don't find either direction), between lovers and
them resistant. I find them quite eager to spouses, among colleagues, and, most de-
get quality information that they can use to cidedly, between evaluators and intended
develop the programs to which they have users. Being kinder and gentler in an effort
dedicated their energies. I try to render to be heard need not indicate cowardice or
judgments, when we have negotiated my a loss of virtue. In this world of ever-greater
taking that role, in ways that can be heard, diversity, sensitivity and respect are not
and I work with intended users to facilitate only virtues, they're more effective and, in
their arriving at their own conclusions. evaluation, more likely to lead to results
They are often harsher on themselves than being used.
I would be.
In my experience, it doesn't so much
require courage to provide negative feed- Moral Discourse
back as it requires skill. Nor do evaluation
clients have to be unusually enlightened for Another attack on utilization-focused
negative feedback to be heard and used if, evaluation charges that it is void of and
through skilled facilitation, the evaluator avoids moral discourse. Thomas Schwandt
has built a foundation for such feedback so (1989b), in a provocative article, set out to
"recapture the moral discourse in evalu- orthodoxy and narrowness, which deserve
ation." He challenged the "bifurcation of attack, but makes himself vulnerable to the
value and fact;" he questioned my assertion counterattack that he is simply substituting
that the integrity of an evaluation rests on his superior moral inquiry for others' supe-
its empirical orientation; and he expressed rior methodological inquiry. He derides the
doubt about "the instrumental use of rea- technical expertise evaluators strive for and
son." He called for a rethinking of what it the arrogance attached to such expertise,
means to evaluate and invited us "to imag- yet he can be accused of merely offering
ine what it would be like to practice evalu- another form of expertise and arrogance,
ation without an instrumentalist concep- this time in the trappings of moral dis-
tion of evaluation theory and practice" course and those who know what questions
(p. 14). are really important.
In contrast to an instrumental and utili- The fundamental value-premise of utili-
tarian approach, he advocated raising fun- zation-focused evaluation is that intended
damental questions about the morality of users are in the best position to decide for
programs, that is, not just asking if pro- themselves what questions and inquiries
grams are doing things right, but are they are most important. From this perspective,
doing right (moral) things? He argued for moral inquiries and social justice concerns
inquiry into the morals and values that ought to be on the menu, not as greater
undergird programs, not just whether pro- goods, but as viable choices. Of course, as
grams work, but what their workings re- noted earlier, what intended users choose
veal about quality of life and the nature of to investigate will be determined by how
society. they are chosen, who they are, what they
Such questions and such a focus for represent, and how the evaluator chooses
inquiry are compatible with a utilization- to work with them—all decisions that in-
focus so long as intended users choose such volve both politics and ethics.
an inquiry. The option to do so ought to be Exhibit 14.4 reproduces a letter from
part of the menu of choices offered to evaluator Yoland Wadsworth, recipient of
primary stakeholders—and has been in this the Churchill Fellowship, the Australian
book. The process uses of evaluation dis- Mental Health Services Award, and the
cussed in Chapter 5 include participatory, Australasian Evaluation Society's Pioneer-
collaborative, empowering, and social jus- ing Evaluation Literature Award. She
tice approaches, which emphasize the writes concerning her own struggle to be
learning and development that derive from true to her personal values as she engages
evaluative inquiry as an end in itself quite in evaluation involving people in great
apart from the use of findings. Examining need, many of whom are politically op-
fundamental values, instrumental assump- pressed. Hers is an action-based inquiry
tions, and societal context are typically part into morality from the evaluation trenches,
of such processes. the source, I believe of her moral authority.
Where I part company from Schwandt Standards and principles provide guid-
is in the superior position he ascribes to the ance for dealing with politics and ethics,
moral questions he raises. Issues of moral- but there are no absolute rules. These are-
ity do not shine brighter in the philosophi- nas of existential action are replete with
cal heavens than more mundane issues of dilemmas, conundrums, paradoxes, per-
effectiveness. He attacks methodological plexities, quandaries, temptations, and
competing goods. Utilization-focused fundamental questions: What does it mean

evaluation may well exacerbate such chal- to be useful? Useful to whom? Who bene-
lenges, so warnings about potential politi- fits? Who is hurt? Who decides? What
cal corruption, ethical entrapments, and values inform the selection of intended us-
moral turpitude direct us to keep asking ers and intended uses? Why? Or, why not?
EXHIBIT 14.4
Moral Discourse From Australia:
Politics and Ethics in the Trenches
Evaluator Yoland Wadsworth spent a sabbatical year reflecting on working collaboratively with
program staff and participants in evaluation and action research efforts aimed at mutual understanding
and change. Her team's evaluation work at the front line in the psychiatric services system was
recognized with a Gold Australian and New Zealand Mental Health Services Partnership Award in
1995, as well as the Australasian Evaluation Society's Pioneering Evaluation Literature award.
Wadsworth worked with a mental health program, one where staff would sometimes forcibly inject
difficult patients with drugs to render them unconscious or strip them and lock them in a bare cell. Her
concerns about how to engage in evaluation in such a setting led her to reflect on community-building
efforts in politically oppressive regimes where practitioners of action research "have been known to
die for their efforts" (Wildman 1995). In a letter to me, she pondered the politics and ethics of being
an evaluator in such circumstances, excerpts of which, with her permission, I've reproduced here.
In mutual inquiry efforts, under difficult political and interpersonal conditions, the most
fragile thing of all is connectedness (or what some of us might call "love" if there wasn't
such a powerful prohibition on mentioning such an emotive state!). What can thrive in its
absence is fear—on both sides.
How do we de-escalate—come together to speak our own truths within our own group,
and then speak our truths together to each other's groups, no matter how uncomfortable?
How do we learn to listen, communicate, heal things, collaborate, and then move on to
new, more reflective and productive action? I want to use "inquiry"—small-scale research
and evaluation methods—to facilitate these processes.
Yet, when it comes to the more powerful partner ceding some real sovereignty to the less
powerful "partner," suddenly fears may re-surface, defenses are re-built, "rational"
objections are voiced, funds dry up, hard-won structures are dismantled until the "time
is right," and old damaging-but-familiar practices resume....
How difficult not to demonize. How difficult to stay firm. How difficult not to feel fearful
and defeated, or alternatively "go over" to staff's way of thinking for a more peaceful life.
How difficult to hold a line and keep respecting and trusting when suddenly such respect
seems betrayed and trust seems naive. Yet, to draw attention to this attracts a howl of
protest from sensitized staff.
And for staff, how difficult to stay open, retain professional or peer standing, not
unwittingly exert power over consumers [mental health patients] yet again, and keep fears
at bay—or well-debriefed. How difficult for staff not to allow feelings of being
disempowered to blot out the knowledge of the profound power they hold over consumers.
How difficult to remain in solidarity with staff at the same time as remaining working with
and for and open to consumers. When the stakes are dangerously high—how difficult for
everyone to avoid withdrawing and, eventually, taking up arms.... It needs courage.
Here's an articulation that guides me. An inpatient, who worked as a consultant to our
project, wrote:
Poem for U&P

In the cold of the dark
I see you stand before me
In a vision of Rage,
that neither you nor I
can control.
Weep for you and me.
Strangers, never more.
Time and place for you and I.
a. U&l was the acronym of Understanding and Involvement—the popular title selected by consumers for our Consumer Evaluation
of Acute Psychiatric Hospital Practice project, 1993-1996. The U is staff or consumer, the / consumer or staff—depending on the
direction of the gaze of the viewer.
Utilization-Focused Evaluation:
Process and Premises
^ J \ sking questions about impact—that's evaluation.

^ y %• Gathering data—that's evaluation.
Making judgments—that's evaluation.
Facilitating use—that's evaluation.
Putting all those pieces together in a meaningful whole that tells people something they
want to know and can use about a matter of importance. Now that's really evaluation!
—Halcolm
A User's Perspective
This final chapter will provide an overview, he was actively reflecting on the role
view of utilization-focused evaluation. I of evaluation in schools as principal of the
want to set the stage by presenting the innovative Saint Paul Open School, which
perspective of a very thoughtful evaluation had just been externally evaluated. Con-
user, distinguished educator Dr. Wayne cerns such as those he expresses helped
Jennings. At the time of the following inter- inspire utilization-focused evaluation.
Patton: You recently participated in a federal evaluation of your school. Why was
it undertaken?
Jennings: It was mandated as part of federal funds we received. We hoped there would
be some benefit to the school, that school staff would learn things to
improve our program. We were interested in an evaluation that would
address basic issues of education. Our school operates on the assumption
that students learn from experience, and the more varied experiences they
have, the more they learn. So, we hoped to learn what parts of the program
made a contribution to and what parts maybe detracted from learning.
371
We hoped also to learn whether the program affected different groups

of students differently, for instance, whether the program was more
effective for younger children or older children. We wanted to know how
well it worked for kids who have a lot of initiative and drive and are
self-motivated learners versus the experiences and learnings of those who
don't have that kind of attitude. And we were interested in determining
what we should concentrate on more—as well as continue or discontinue.
But we didn't get information to make those kind of basic decisions.
We asked the research firm for an evaluation that would help us with
those kinds of questions. They came up with a design that seemed far off
target—not at all what we had asked for. It takes a little imagination to do
an evaluation that fits an innovative program. We got a standard recipe
approach. I'm convinced that if we'd asked totally different questions, we'd
have gotten the same design. It's as though they had a cookie cutter, and
whether they were evaluating us or evaluating vocational education or
anything—a hospital or a prison —it would have been the same design.
Patton: So why did you accept the design? Why participate in an evaluation you
thought would be useless educationally?
Jennings: That's a good question. It happened like this. The first year, we worked
with an out-of-state firm that had won the federal contract. The president
came for the initial discussion in September, when school was getting
started. He said he'd like to look around. Ten minutes later, I found him
sitting in the front hall in a state of what appeared to be absolute shock.
He was not prepared for our program, not in the least. He came and found
kids running around. Students were noisy. They weren't seated in straight
rows. We didn't resemble his conception of a school in any way, apparently.
He just wasn't prepared for open education—for an alternative approach.
That was the last we saw of him. He sent out a new researcher who didn't
walk around the school. We simply met in the office and hashed out what
data were required by funders. These people were not prepared to analyze
a nonstandard school operation or adapt their design to our situation.
I think schools have operated pretty much in the same way for so long
that all of us have a mind-set of what school is. So, to see something
different can shock—culture shock. We've seen that in the eyes of several
evaluators we've had in here. Now, I don't know how to prepare a standard
educational evaluator to be open to an open school. Maybe we needed
more of a participant observer, an anthropological type, someone who
would come in and live here for a while and find out what the hell's go-
ing on.
Patton: You couldn't find that kind of evaluator?
Jennings: We did have, somewhere along the line, a proposal from a firm that had
experience with our kind of schools. They were going to give us the kind
of evaluation we were very much interested in, the kind we could learn
from. We wanted to pursue that design, but State Department of Education
Process and Premises • 373
officials said, "No, that won't provide us with concrete, comparable data
for accountability."
Patton: The design approved by the State Department of Education, did that
provide some "concrete, comparable data for accountability"? What were
the findings?
Jennings: We knew politically that we had to achieve a certain degree of respectability
with regard to standardized testing, so we looked at test results, reading
scores, and all that sort of thing, and they seemed satisfactory. We didn't
discover anything particularly startling to cause any serious problems with
our Board of Education or the State Department of Education.
Patton: In what form did you receive the findings?
Jennings: In a report. I think it would have been helpful for the evaluators to meet
with our staff and talk us through the report, elaborate a little on it, but
that didn't happen—partly because the report came either during the
summer or the following fall.
Patton: So, of what use was the evaluation?
Jennings: It served to legitimize us. Our local Board of Education, the district
administration, and the State Department of Education were all interested
in seeing an evaluation undertaken, but I don't think a single member of
the board or administration read it. They may have glanced at the summary,
and the summary said that the school was OK. That was it.
Patton: Any uses besides legitimization? Anything you learned?
Jennings: I suppose you could say we learned how difficult it is to get helpful
evaluations.
Patton: What was staff reaction to the report?
Jennings: I doubt many bothered to read it. They were looking for something
worthwhile and helpful, but it just wasn't there. Staff were interested in
thinking through a good evaluation, but that would have required the
evaluators to become thoroughly backgrounded in order to help us think
it through and provide a proper evaluation. As it turned out, I think the
staff have acquired a negative attitude about evaluation. I mean, they are
interested in evaluation, but then the reports seem to lack anything that
would help us with day-to-day decision making or give us a good mirror
image of what's going on in the school, from an outsider's point of view,
in terms of the growth and development of kids in all dimensions.
Patton: Let me broaden the question about use. Sometimes evaluations have an
impact on things that go beyond immediate program improvements—
things like general thinking on issues that arise from the study, or board
policies, or legislation. Did the evaluation have an impact in any broader
ways?
Jennings: I'm glad you're getting into that area, because that's of major interest to
me. I had hoped that as the school got under way, it would become fairly
clear, or would be evident to those interested in finding out, that this was
a highly effective educational program that was making a considerable

difference in the lives of most children. Well, the study didn't address that;
the study simply said, "The program's OK." That's all it said. Given the
limited resources and imagination that were put into the evaluation, I'm
not convinced they had enough knowledge to say even that—to say
anything about the effectiveness of the program!
Our major job is to educate 500 students, but we're engaged in a much
larger struggle, at least I am, and that is to show that a less formal approach
based on experiential education can be highly effective, including bringing
students to the same level of achievement in a shorter time. We're con-
cerned about achievement, but also about producing genuinely humane
people for a complex, changing world.
Now, to be fair, the evaluation did provide something useful. When
parents or other educators ask if the program has been evaluated and what
the findings were, we say, "Yes, the evaluation shows the program is
effective." On the other hand, anyone worth their salt, I suspect, if they
read the evaluation carefully, would decide that it doesn't show much of
anything, really, when you come right down to it. We're left where we
began, but we have the illusion of at least having been evaluated.
Patton: I pulled some of the recommendations out of the study, and I'd just like to
have you react to how those were received and used in any way by the
school. The first one that's listed in the report is that objectives as they are
related to the goals of the Open School should be written in performance-
specific language. What was the reaction to that recommendation?
Jennings: I know that's the current popular view in education today, but I'm not sure
it could be done. It would require an enormous amount of energy. Many
of our objectives are not very specific subject-matter kinds of objectives.
The general goals of the school are more philosophical in tone, and I guess
we're just not willing to invest the time and energy to reduce those to the
kinds of performance objectives they're speaking of, and I don't know if
the end results would be particularly helpful.
Patton: Did it seem to you that that recommendation followed from the findings
of the study?
Jennings: When I read that one, I thought—Where did that come from? You know,
how did they arrive at that? Is that just some conventional wisdom in
education today that can be plugged into any set of recommendations?
Patton: What, then, was your overall reaction to the recommendations?
Jennings: Each was simpleminded. They showed that the evaluators lacked depth of
understanding of what the program was trying to accomplish. Each recom-
mendation could have been a report in itself rather than some surface
scratching and coming up with some conclusions that just weren't helpful.
Patton: Bottom line—what'd the evaluation accomplish?
Jennings: Legitimation. Just by existing, it served that function. With an impressive

cover, even if it was filled with Chinese or some other language, as long as
it was thick and had a lot of figures in it and people thought it was an
evaluation and somewhere near the end it said that the school seemed to
be doing a reasonably good job, that would be satisfactory for most people.
Patton: What about the evaluation supported the legitimation function?
Jennings: Its thickness. The fact that it's got numbers and statistics in it. It's authored
by some Ph.D.s. It was done by an outside firm. Those things all lend
credibility.
Patton: What would it have taken to make it more useful?
Jennings: I would say that to do the job right, we'd have to have people on our own
staff who were free from most other responsibilities so that they could deal
with designing the evaluation and work with the evaluators. Then I think
that as the evaluation proceeded, there should probably have been regular
interactions to adjust the evaluation—keep it relevant.
Patton: There's a side effect of evaluation studies that affects the way people like
yourself, who are administrators and work in government and agencies and
schools, feel about evaluation. How would you describe your general
opinion of evaluation? Positive? Negative? Favorable? Unfavorable?
Jennings: We want careful evaluation. We want to know what we're doing well and
not so well. We want data that will help us improve the program. We want
that. We want the best that's available, and we want it to be accurate and
we want the conclusions to be justified, and so on. We just desperately want
and need that information, to know if we're on the right track. But we
haven't gotten that so, by and large, my opinion of evaluation is not very
good. Most reports look like they were written the last week before they're
published, with hastily drawn conclusions and sometimes data that's
manipulated for a preconceived conclusion fitting the evaluators' or fun-
ders' biases.
Patton: Have you given up on evaluation?
Jennings: I guess the reason hope springs eternal is that I have read carefully done
evaluations that have informed my thinking about education and brought
me to my present beliefs. I'd guess that 99% of evaluation is done on a
model of education that I consider obsolete, like a factory trying to perfect
its way of making wagon wheels. We need more relevant and useful
approaches, something beyond wagon wheels evaluations.
The evaluators of the Open School be hurt. The thrust of their comments was
were also interviewed, but when they read that they had viewed the state and federal
Jennings's comments, they asked that education agencies as their primary audi-
their interview not be used and that they ences. They did what they were con-
not be named because their business might tracted to do. Once they submitted their
reports, they had no further contacts with shown in Exhibit 15.1. Staff supported the
the funders of the evaluation and did not evaluation with their own school resources
know how or if the reports had been because they found the process and results
useful. Federal and state officials we con- useful.
tacted said that they received hundreds of
such evaluations and could not comment
on specific case examples among the many The Flow of a Utilization-
they monitor. Focused Evaluation Process
Exhibit 15.2 presents a flowchart of

A Utilization-Focused Alternative utilization-focused evaluation. First, in-
tended users of the evaluation are identi-
Shortly after my interview with Jen- fied (Chapter 3). These intended users are
nings, he formed an evaluation task force brought together or organized in some
made up of teachers, parents, students, fashion, if possible (e.g., an evaluation task
community people, and graduate students force of primary stakeholders), to work
trained in utilization-focused evaluation. with the evaluator and share in making
With very limited resources, they designed major decisions about the evaluation.
an intensive study of Saint Paul Open Second, the evaluator and intended us-
School processes and outcomes using a va- ers commit to the intended uses of the
riety of methods, both quantitative and evaluation (Chapters 4 and 5) and deter-
qualitative. That evaluation provided use- mine the focus of the evaluation (Chapters
ful information for incremental program 2 and 8). This can include considering the
development. Exhibit 15.1 contrasts their relative importance of focusing on attain-
internal Open School Task Force evalu- ment of goals (Chapter 7), program imple-
ation with the earlier, mandated external mentation (Chapter 9), and/or the pro-
evaluation. These contrasts highlight the gram's theory of action (Chapter 10). The
critical elements of utilization-focused menu of evaluation possibilities is vast, so
evaluation. many different types of evaluations may
The three years of external, federally need to be discussed. (See Menu 8.1 at the
mandated evaluation at the Saint Paul end of Chapter 8 for a suggestive list of
Open School cost about $40,000—not a different evaluation questions and types.)
great deal of money as research goes. Yet, The evaluator works with intended users to
in the aggregate, evaluations on hundreds determine priority uses with attention to
of programs like this cost millions of dol- political and ethical considerations (Chap-
lars. They constitute the major proportion ter 14). In a style that is active-reactive-
of all evaluations conducted in the country. adaptive and situationally responsive, the
The benefit of those dollars is problematic. evaluator helps intended users answer
The internal, utilization-focused evalu- these questions: Given expected uses, is the
ation cost less than $1,000 in hard cash evaluation worth doing? To what extent
because the labor was all volunteer and and in what ways are intended users com-
release time. Due to the success of the mitted to intended use?
internal task-force effort, the school con- The third part of the process as depicted
tinued the utilization-focused approach in the flowchart involves methods, mea-
EXHIBIT 15.1
Contrasting Evaluation Approaches
Open School Utilization-Focused Evaluation Original External Mandated Evaluation
1. A task force of primary intended users formed to 1. The evaluation was aimed vaguely at multiple audi-
focus evaluation questions. ences: federal funders, the school board, State
Department of Education staff, the general public,
and Open School staff.
2. This group worked together to determine what infor- 2. The evaluators unilaterally determined the evalution
mation would be useful for program improvement focus based on what they thought their audiences
and public accountability. The first priority was for- would want. Evaluators had minimal interactions
mative evaluation. with these audiences.
3. The evaluation included both implementation (pro- 3. The evaluation was a pure outcomes study. Evalua-
cess) data and outcomes data (achievement data tors collected data on presumed operational goals
and follow-up of Open School graduates). (i.e., scores on standardized achievement tests)
based on a model that fit the evaluators' but not the
programs assumptions.
4. The task force based their evaluation on an explicit 4. Evaluators ignored the program's philosophy and
statement of educational philosophy (a theory of conceptualized the evaluation in terms of their own
action). implicit educational theory of action.
5. A variety of methods were used to investigate a 5. The major measurement technique was use of
variety of questions. Methods were selected jointly standardized tests that had low face validity, low
by evaluators and intended users using multiple credibility, and low relevance to program staff; other
criteria: methodological appropriateness, face va- audiences, especially federal funders and state
lidity of instrumentation, believability, credibility, agency staff, appeared to want such instruments,
and relevance of the design and measuring instru- but it was unclear who the evaluation was supposed
ments to information users and decision makers; to serve. Methods were determined largely by
and available resources. The task force was in- evaluators, based on available resources, with only
volved on a continual basis in making methods initial review by program staff and federal and state
and measurement decisions as circumstances officials.
changed.
6. Task force members worked together to analyze 6. Evaluators analyzed and interpreted data by them-
and interpret data as they were gathered. Data selves. A final report was the only form in which
were discussed in rough form over a period of time findings were presented. No interpretation ses-
before the evaluators wrote the final report. Find- sions with program staff or any audience were ever
ings and conclusions were known and being used held.
before the final report was ready for dissemination.
7. When the report was made public, the school prin- 7. The final report was mailed to funding agencies. No
cipal and evaluators made presentations to par- verbal presentations were made. No discussions of
ents, staff, and school officials. findings took place.
8. The evaluation was used by Open School staff for 8. No specific use was made of the evaluation though
program development and shared with interested it may have helped legitimize the program by giving
accountability audiences to show how the program the "illusion" of outcomes evaluation.
was being improved.
EXHIBIT 15.2
Utilization-Focused Evaluation Flowchart
START WITH IDENTIFY INTERESTS &

STAKEHOLDER ADD PRIMARY
COMMITMENTS OF
ANALYSIS INTENDED USERS
POTENTIAL USERS
ASSESS CONSEQUENCES
DETERMINE PRIMARY FOR USE OF NOT
INTENDED USERS INVOLVING SOME
STAKEHOLDERS
NEGOTIATE A PROCESS
TO INVOLVE PRIMARY
INTENDED USERS IN
MAKING EVALUATION
DECISIONS
DETERMINE THE JUDGEMENT?

FOCUS: PRIORITIZE
PRIMARY PURPOSES IMPROVEMENT?
EVALUATION QUESTIONS
& INTENDED USES OF KNOWLEDGE?
& ISSUES
THE EVALUATION PROCESS USES?
SIMULATE USE WITH

FABRICATED POTENTIAL
FINDINGS
IDENTIFY MORE USEFUL

OR ISSUES
IDENTIFY MORE USEFUL

OR ISSUES
MAKE DESIGN METHODS S

MEASUREMENT DECISIONS
FACILITATE INTENDED
COLLECT DATA
USE BY INTENDED
ARE
USERS
DESIRED METHODS"
APPROPRIATE TO THE
^QUESTIONS BEING..
ASKED?
ORGANIZE DATA TO BE
WILL UNDERSTANDABLE
RESULTS OBTAINED^ TO USERS
FROM THESE METHODS
BE BELIEVABLE? .
VALID?
ACTIVELY INVOLVE
ARE USERS IN
'PROPOSED METHODS' INTERPRETING
PRACTICAL? FINDINGS
^COST-EFFECTIVE?^
DISSEMINATE
. ETHICAL?
FINDINGS TO
POTENTIAL USERS
, YES
WILL
'RESULTS OBTAINED^
FROM THESE
METHODS BE
USED?
END
DOES THE
BY EVALUATING
NO EVALUATION .YES
THE EVALUATION
MEET STANDARDS &
PRINCIPLES?
surement, and design decisions (Chap- process by depicting loops at the points
ters 11 and 12). A variety of options are where intended users are identified and
considered: qualitative and quantitative again where evaluation questions are fo-
data; naturalistic, experimental, and quasi- cused. For the sake of diagrammatic sim-
experimental designs; purposeful and plicity, however, many potential loops are
probabilistic sampling approaches; greater missing. The active-reactive-adaptive eval-
and lesser emphasis on generalizations; and uator who is situationally responsive and
alternative ways of dealing with potential politically sensitive may find that new
threats to validity, reliability, and utility. stakeholders become important or new
More specifically, the discussion at this questions emerge in the midst of methods
stage will include attention to issues of decisions. Nor is there a clear and clean
methodological appropriateness, believ- distinction between the processes of focus-
ability of the data, understandability, accu- ing evaluation questions and making meth-
racy, balance, practicality, propriety, and ods decisions.
cost. As always, the overriding concern will The real world of utilization-focused
be utility. Will results obtained from these evaluation manifests considerably more
methods be useful—and actually used? complexity than a flowchart can possibly
Once data have been collected and or- capture. The flowchart strives to outline
ganized for analysis, the fourth stage of the the basic logic of the process, but applying
utilization-focused process begins. In- that logic in any given situation requires
tended users are actively and directly in- flexibility and creativity.
volved in interpreting findings, making
judgments based on the data, and generat-
ing recommendations (Chapter 13). Spe- The Achilles' Heel of
cific strategies for use can then be formal- Utilization-Focused Evaluation
ized in light of actual findings, and the
evaluator can facilitate following through Achilles' fame stemmed from his role as
on actual use. hero in Homer's classic, the Iliad. He was
Finally, decisions about dissemination of the Greeks' most illustrious warrior during
the evaluation report can be made beyond the Trojan War, invulnerable because his
whatever initial commitments were made mother had dipped him in the Styx, the
earlier in planning for intended use. This river of the underworld across which
reinforces the distinction between intended Charon ferried the dead. His heel, where
use by intended users (planned utilization) she held him in the river, was his sole point
and more general dissemination for broad of vulnerability, and it was there that he was
public accountability (where both hoped fatally wounded with an arrow shot by
for and unintended uses may occur). Paris.
While the flowchart in Exhibit 15.2 de- The Achilles' heel of utilization-focused
picts a seemingly straightforward, one- evaluation, its point of greatest vulnerabil-
step-at-a-time logic to the unfolding of a ity, is turnover of primary intended users.
utilization-focused evaluation, in reality The process so depends on the active en-
the process is seldom simple or linear. The gagement of intended users that to lose
flowchart attempts to capture the some- users along the way to job transitions, reor-
times circular and iterative nature of the ganizations, reassignments, and elections
can undermine eventual use. Replacement ers, getting them to commit time and atten-
users who join the evaluation late in the tion to the evaluation, dealing with political
process seldom come with the same agenda dynamics, building credibility, and con-
as those who were present at the beginning. ducting the evaluation in an ethical manner.
The best antidote involves working with a All of these challenges revolve around the
task force of multiple intended users so that relationship between the evaluator and in-
the departure of one or two is less critical. tended users. When new intended users
Still, when substantial turnover of primary replace those who depart, new relation-
intended users occurs, it may be necessary ships must be built. That may mean delays
to reignite the process by renegotiating the in original time lines, but such delays pay
design and use commitments with the new off in eventual use by attending to the
arrivals on the scene. foundation of understandings and relation-
Previous chapters have discussed the ships upon which utilization-focused evalu-
challenges of selecting the right stakehold- ation is built.
Fundamental Premises of
Utilization-Focused Evaluation
Articulating fundamental premises re- Mulla Nasrudin standing in the center of

quires making assumptions and values ex- the marketplace while a compatriot stopped
plicit. What seems obvious to one person passersby, whispering to them how they
may not be at all obvious to another. Con- could entertain themselves by showing
sider, for example, the Sufi story about Nasrudin to be an idiot.
;When offered a choice between two coins of different value, Nasrudin always chose
throne worth less.
• One day, a kind man tried to enlighten the foolish Nasrudin. "You should take the
coin of greater value," he urged, "then you'd have more money and people would no
longer be able to make a fool of you."
"But," replied Nasrudin, "if I take the more valuable coin, people will stop offering
me money to prove that I'm more idiotic than they are. Then I would hitve no money
at all."
—Adapted from Muh l"~ >:>!
The premises of utilization-focused ises have been articulated throughout this

evaluation will seem obvious to some, of book. Here, however, for the first time, as
dubious merit to others, and controversial a summary of what has gone before, I have
to many more. The rationales for and pulled together 14 fundamental premises
evidence supporting these various prem- of utilization-focused evaluation.
1. Commitment to intended use by in- 5. Evaluations must be focused in

tended users should be the driving force in some way; focusing on intended use by
an evaluation. At every decision point— intended users is the most useful way. Re-
whether the decision concerns purpose, source and time constraints will make it
focus, design, methods, measurement, impossible for any single evaluation to an-
analysis, or reporting—the evaluator asks swer everyone's questions or to give full
intended users: How would that affect attention to all possible issues. Because no
your use of this evaluation? evaluation can serve all potential stake-
holders' interests equally well, stakehold-
2. Strategizing about use is ongoing ers representing various constituencies
and continuous from the very beginning of should come together to negotiate what
the evaluation. Use isn't something one issues and questions deserve priority.
becomes interested in at the end of an
evaluation. By the end of the evaluation, 6. Focusing on intended use requires
the potential for use has been largely deter- making deliberate and thoughtful choices.
mined. From the moment stakeholders and Menu 4.1 in Chapter 4 identified three
evaluators begin interacting and conceptu- primary uses of findings: judging merit or
alizing the evaluation, decisions are being worth (e.g., summative evaluation); im-
made that will affect use in major ways. proving programs (instrumental use); and
generating knowledge (conceptual and for-
3. The personal factor contributes sig- mative use). Menu 5.1 in Chapter 5 pre-
nificantly to use. The personal factor refers sented four kinds of process use: enhancing
to the research finding that the personal shared understandings, reinforcing inter-
interests and commitments of those in- ventions, supporting participant engage-
volved in an evaluation undergird use. ment, and developing programs and or-
Thus, evaluations should be specifically ganizations. Uses can change and evolve
user oriented—aimed at the interests and over time as a program matures.
information needs of specific, identifiable
people, not vague, passive audiences. 7. Useful evaluations must be de-
signed and adapted situationally. Stan-
4. Careful and thoughtful stakeholder dardized recipe approaches won't work.
analysis should inform identification of The relative value of a particular utilization
primary intended users, taking into ac- focus (Premises 5 and 6) can only be judged
count the varied and multiple interests that in the context of a specific program and the
surround any program, and therefore, any interests of intended users. Situational fac-
evaluation. Staff, program participants, di- tors affect use. As Exhibit 6.1 in Chapter 6
rectors, public officials, funders, and com- showed, these factors include community
munity leaders all have an interest in evalu- variables, organizational characteristics,
ation, but the degree and nature of their the nature of the evaluation, evaluator
interests will vary. Political sensitivity and credibility, political considerations, and
ethical judgments are involved in identify- resource constraints. In conducting a
ing primary intended users and uses. utilization-focused evaluation, the active-
reactive-adaptive evaluator works with in- ing to intended users their own best judg-
tended users to assess how various factors ments about appropriate evaluation focus
and conditions may affect the potential and methods; they are reactive in listening
for use. attentively and respectful to others' con-
cerns; and they are adaptive in finding ways
8. Intended users' commitment to use to design evaluations that incorporate di-
can be nurtured and enhanced by actively verse interests, including their own, while
involving them in making significant deci- meeting high standards of professional
sions about the evaluation. Involvement practice. Evaluators' credibility and integ-
increases relevance, understanding, and rity are factors affecting use as well as the
ownership of the evaluation, all of which foundation of the profession. In this re-
facilitate informed and appropriate use. gard, evaluators should be guided by the
profession's standards and principles (see
9. High-quality participation is the
Exhibit 1.3 in Chapter 1 and Exhibit 2.1 in
goal, not high-quantity participation. The
Chapter 2).
quantity of group interaction time can be
inversely related to the quality of the pro-
12. Evaluators committed to enhancing
cess. Evaluators conducting utilization-
use have a responsibility to train users in
focused evaluations must be skilled group
evaluation processes and the uses of infor-
facilitators.
mation. Training stakeholders in evalu-
ation methods and processes attends to
10. High-quality involvement of in-
both short-term and long-term evaluation
tended users will result in high quality,
uses. Making decision makers more sophis-
useful evaluations. Many researchers
ticated about evaluation can contribute to
worry that methodological rigor may be
greater use of evaluation over time.
sacrificed if nonscientists collaborate in
making methods decisions. But, decision
makers want data that are useful and accu- 13. Use is different from reporting and
rate. Validity and utility are interdepen- dissemination. Reporting and dissemina-
dent. Threats to utility are as important to tion may be means to facilitate use, but they
counter as threats to validity. Skilled evalu- should not be confused with such intended
ation facilitators can help nonscientists un- uses as making decisions, improving pro-
derstand methodological issues so that they grams, changing thinking, empowering
can judge for themselves the trade-offs in- participants, and generating knowledge
volved in choosing among the strengths (see Premise 6).
and weaknesses of design options and
methods alternatives. 14. Serious attention to use involves
financial and time costs that are far from
11. Evaluators have a rightful stake in trivial. The benefits of these costs are mani-
an evaluation in that their credibility and fested in greater use. These costs should be
integrity are always at risk, thus the man- made explicit in evaluation proposals and
date for evaluators to be active-reactive- budgets so that utilization follow-through
adaptive. Evaluators are active in present- is not neglected for lack of resources.
A Vision of an Experimenting Society

and Experimenting Evaluators
W^t—i—% n experimenting society would vigorously try out possible solutions to recur-
^ y %• rent problems and would make hard-beaded, multidimensional evaluations of
outcomes, and when the evaluation of one reform showed it to have been ineffective or
harmful, would move on to try other alternatives.
—Donald T. Campbell (1988:291)
I o be truly scientific we must be able to experiment. We must be able to advocate

\ ^ ^ without that excess of commitment that blinds us to reality testing.
—Donald T. Campbell (1969:410)
Donald T. Campbell, one of the fathers focused evaluation. Still, skeptics will be
of scientific evaluation, died in 1996 after skeptical—some who don't want to take
nearly four-score years, many of them the time to work with stakeholders, others
working to realize his vision of an Experi- who don't want to give up control of the
menting Society. His vision lives on. Its process, and still others who are convinced
realization will depend on a shared com- that it probably works for certain kinds
mitment to engage in active reality testing of evaluators (the personable, the human-
by all those involved in and touched by relations types, whatever . . . ), but that it
programs and policies, not just researchers won't work for them.
and evaluators. But evaluators must point Certainly, I can offer no guarantees that
the way. Utilization-focused evaluation in- a utilization-focused approach will always
vites stakeholders to join with evaluators as work. Just as decision makers live in a
informed citizens of an Experimenting world of uncertainty, so too evaluators are
Global Society. faced with the ever-present possibility that,
Utilization-focused evaluation com- despite their best efforts, their work will be
bines style and substance, activism and sci- ignored or, worse, misused. Producing
ence, personal perspective and systematic good evaluation studies that actually are
information. I have tried to capture the used constitutes an enormous challenge. In
complexity and potential of utilization-fo- many ways, the odds are all against use, and
cused evaluation with scenarios, case ex- it's quite possible to become cynical about
amples, findings from our study of federal the futility of trying to have an impact in a
evaluation use, Sufi parables, and chil- world in which situation after situation
dren's stories. In the end, this approach to seems impervious to change. Utilization-
evaluation must also be judged by its use- focused evaluators may be told, or may
fulness. sometimes feel, that they are wasting their
I have presented research and theories time. A final Sufi story provides, perhaps,
that support the premises of utilization- something for skeptics to ponder.
- Yogurt is made by adding a small quantity of old yogurt to a larger measure of milk.
The action of the bacillus bulgaricus in the seeding portion of yogurt will in time convert
the whole into a mass of new yogurt.
One iLiy some friends saw Nasrudin down on his knees beside a warm forest spring
adding a mass of yogurt to the water. One of the passers-by asked, "What are you trying
to do, Nasrudin?"
"I'm trying to make yogurt."
"But you can't make yogurt like that," he scoffed.
"Perhaps not, but just supposing it works!"
The next day Nasrudin invited the entire village to taste his concoction. It wasn't
like the yogurt they were used to, but all agreed it was unique and delicious.
The following day Nasrudin returned to the warm spring to make another batch. The
result this time tasted acrid and made many who tried it sick.
For weeks thereafter, Nasrudin returned to the spring, each day trying to make again
his original tasty creation. But, having failed to carefully observe and write down what
he had done and what conditions prevailed at the time, he never succeeded in
reproducing the original. He did, however, produce other tasty delicacies, but the
villagers were reluctant to try them since they never knew for sure whether they would
be delighted or made sick. Eventually, Nasrudin gave up, since he could never predict
with certainty what would result in the changing conditions of the forest spring.
—Adapted from Shah l^Mi'JO
Getting intended users to taste evalu- est ideals of humankind. At stake is the
ation may, indeed, be a long shot in many vision of an Experimenting Society. It may
situations. Many have been made "sick" be a long shot, "but just supposing it
by past concoctions called evaluation. The works!" And works for you. The only way
results of any particular effort cannot be to find out is to try it—and evaluate the
guaranteed. Each evaluation being a blend results. Build the study of use into your
of unique ingredients, no standardized evaluations and thereby help make not
recipe can ensure the outcome. We have only programs, but also evaluations, ac-
only principles, premises, and utilization- countable. Experiment with ways of mak-
focused processes to guide us, and we have ing evaluation useful, for the vision of an
much yet to learn. But the potential bene- Experimenting Society ultimately de-
fits merit the efforts and risks involved. At pends on experimenting and innovating
stake is improving the effectiveness of pro- evaluators and evaluation users working
grams that express and embody the high- together.
References
Abramson, M. A. 1978. The Funding of Social . 1985. A Guide for Evaluation Deci-
Knowledge Production and Application: A sion Makers. Beverly Hills, CA: Sage.
Survey of Federal Agencies. Washington, . 1975a. "Evaluation: Who Needs It?
DC: National Academy of Sciences. Who Cares? Studies in Educational Evalu-
Ackerman, Bruce A. 1977. "Illusions About ation 1(3):201-12.
New Cars, Clean Air." Minneapolis Trib- . 1975b. "Framing the Decision Con-
une, August 29, p. 4A. text." In AERA Cassette Series in Evaluation.
Washington, DC: American Educational
ACVAFS. 1983. Evaluation Sourcebook. New
Research Association.
York: American Council of Voluntary Agen-
cies for Foreign Service. . 1972. "Wider Context Goals and
Goal-Based Evaluators." Evaluation Com-
AEA Task Force on Guiding Principles for
ment: The Journal of Educational Evalu-
Evaluators. 1995. "Guiding Principles for
ation (Center for the Study of Evaluation,
Evaluators." New Directions for Program
UCLA) 3(4): 10-11.
Evaluation, Summer, pp. 19-34.
. 1970. "A Review of the Evaluation of
AES (Australasian Evaluation Society). 1995. the Follow Through Program." Working
"Evaluation! Are You Being Served? How Paper 10, Center for the Study of Evalu-
well are evaluation practices serving cus- ation, UCLA.
tomers, clients, and other stakeholders."
Alkin, Marvin and Karin Coyle. 1988.
Presented at the 1995 Annual Conference,
"Thoughts on Evaluation Misutilization."
Sydney, Australia.
Presented at the Annual Meeting of the
Alkin, Marvin. 1995. "Lessons Learned About American Educational Research Associa-
Evaluation Use." Panel presentation at the tion, April 5, New Orleans. See also, Studies
International Evaluation Conference, Amer- in Educational Evaluation 14:331-40.
ican Evaluation Association, November 2, Alkin, Marvin C , Richard Daillak, and Peter
Vancouver, British Columbia. White. 1979. Using Evaluations: Does
, ed. 1990. Debates on Evaluation. Evaluation Make a Difference? Beverly
Newbury Park, CA: Sage. Hills, CA: Sage.
387
388 • UTILIZATION-FOCUSED EVALUATION
Alkin, Marvin C , with P. Jacobson, J. Burry, . 1974. Theory in Practice: Increasing

P. White, and L. Kent. 1985. Organizing for Professional Effectiveness. San Francisco:
Evaluation Use. A Handbook for Adminis- Jossey-Bass.
trators. Los Angeles: Center for the Study o( Attkisson, C. Clifford, W. A. Hargreaves, M. J.
Evaluation, UCLA. Horowitz, and J. E. Sorenson, eds. 1978.
Alkin, Marvin and Alex Law. 1980. "A Conver- Evaluation of Human Service Programs.
sation on Evaluation Utilization." Educa- New York: Academic Press.
tional Evaluation and Policy Analysis Aubel, Judi. 1993. Participatory Program
2(3):73-79. Evaluation: A Manual for Involving Stake-
Allison, Graham T. 1971. Essence of Decision: holders in the Evaluation Process. Dakar,
Explaining the Cuban Missile Crisis. Boston: Senegal: Catholic Relief Services under a
Little, Brown. U.S. AID grant.
Altschuld, James W. and Molly Engle, eds. Aubrey, Robert and Paul Cohen. 1995. Work-
1994. The Preparation of Professional ing Wisdom: Learning Organizations. San
Evaluators: Issues, Perspectives, and Pro- Francisco: Jossey-Bass.
grams (New Directions for Program Evalu- Auditor General of Canada. 1993. Program
ation, No. 62, Summer). San Francisco: Evaluation, Report to the House of Com-
Jossey-Bass. mons on Program Evaluation. Ottawa: Of-
Anderson, Barry F. 1980. The Complete fice of the Auditor General of Canada.
Thinker. Englewood Cliffs, NJ: Prentice Australian Development Assistance Bureau.
Hall. 1982. Summaries and Review of Ongoing
Anderson, John R., Lynne M. Reder, and Her- Evaluation Studies, 1975-80. Canberra:
Australian Government Publishing Service.
bert Simon. 1996. "Situated Learning and
Education." Educational Researcher 25(4): Azumi, Koya and Jerald Hage, eds., 1972. Or-
5-21. ganizational Systems. Lexington, MA: D.
C. Heath.
Anderson, Richard B. 1977. "The Effectiveness
of Follow Through: What Have We Barkdoll, Gerald L. 1982. "Increasing the Im-
Learned?" Presented at the annual meeting pact of Program Evaluation by Altering
of the American Educational Research Asso- the Working Relationship Between the
ciation, New York. Program Manager and the Evaluator."
Ph.D. dissertation, University of Southern
Argyris, Chris. 1982. Reasoning, Learning, and
California.
Action. San Francisco: Jossey-Bass.
. 1980. "Type III Evaluations: Consul-
. 1976. Increasing Leadership Effective-
tation and Consensus." Public Administra-
ness. New York: John Wiley.
tion Review (March/April): 174-79.
. 1974. Theory in Practice: Increasing Barley, Zoe A. and Mark Jenness. 1993. "Clus-
Professional Effectiveness. San Francisco: ter Evaluation: A Method to Strengthen
Jossey-Bass. Evaluation in Smaller Programs with Sim-
Argyris, Chris, R. Putnam, and D. M. Smith. ilar Purposes." Evaluation Practice 14(2):
1985. Action Science. San Francisco: Jossey- 141-47.
Bass. Becker, Howard. 1970. "Whose Side Are We
Argyris, Chris and Donald Schon. 1978. Orga- On?" Pp. 15-26 in Qualitative Methodol-
nizational Learning. Reading, MA: Addison- ogy, edited by William J. Filstead. Chicago:
Wesley. Markham.
References • 389
Bedell, J. R., J. C. Ward, Jr., R. P. Archer, and Evaluation." Evaluation Review 9(2): 189-
M. K. Stokes. 1985. "An Empirical Evalu- 208.
ation of a Model of Knowledge Utilization." Blalock, Hubert M., Jr. 1964. Causal Inferences
Evaluation Review 9(2); 109-26. in Nonexperimental Research. Chapel Hill:
Bednarz, D. 1985. "Quantity and Quality in University of North Carolina Press.
Evaluation Research: A Divergent View." Blanchard, Ken. 1986. Situational Leadership
Evaluation and Program Planning 8:289- (Two volume, 12-tape audiotape set).
386. Escondido, CA: Blanchard Training and De-
Behn, Robert D. 1995. "The Management of velopment, Inc.
Reinvented Federalism." Governing (Febru- Blumer, Herbert. 1969. Symbolic Interaction-
ary):54. ism. Englewood Cliffs, NJ: Prentice Hall.
Bellavita, C , J. S. Wholey, and M. A. Bonsignore, Michael. 1996. "How Total Qual-
Abramson. 1986. "Performance-Oriented ity Became Framework for Honeywell."
Evaluation: Prospects for the Future." Pp. 286- Minneapolis Star Tribune, April 15, p. D3.
92 in Performance and Credibility: Develop-
Boruch, R. F., A. J. McSweeny, and E. J. Soder-
ing Excellence in Public and Nonprofit Or-
strom. 1978. "Randomized Field Experi-
ganizations, edited by J. S. Wholey, M. A.
ments for Program Planning, Development,
Abramson, and C. Bellavita. Lexington,
and Evaluation: An Illustrative Bibliogra-
MA: Lexington.
phy." Evaluation Quarterly 2:655-95.
Bennett, Claude F. 1982. Reflective Appraisal
Boruch, Robert and David Rindskopf. 1984.
of Programs. Ithaca, NY: Cornell University
"Data Analysis." Pp. 121-58 in Evaluation
Media Services.
Research Methods, edited by Leonard Rut-
. 1979. Analyzing Impacts of Extension
man. Beverly Hills, CA: Sage.
Programs. Washington, DC: U.S. Depart-
Boruch, Robert F. and Werner Wothke, eds.
ment of Agriculture.
1985. Randomization and Field Experimen-
Berger, Peter L. and Thomas Luckman. 1967.
tation (New Directions for Program Evalu-
The Social Construction of Reality. Garden
ation, No. 28, December). San Francisco:
City, NY: Doubleday/Anchor.
Jossey-Bass.
Bernstein, Ilene and Howard E. Freeman.
1975. Academic and Entrepreneurial Re- Boyer, J. F. and L. I. Langbein. 1991. "Factors
search: Consequences of Diversity in Federal Influencing the Use of Health Evaluation
Evaluation Studies. New York: Russell Sage. Research in Congress." Evaluation Review
Beyer, Janice M. and Harrison M. Trice. 1982. 15:507-32.
"The Utilization Process: A Conceptual Brandl, John. 1994. "Must Deal With the
Framework and Synthesis of Empirical Bureaucracy—But Exactly How Is Harder
Findings." Administrative Science Quarterly to Say." Minneapolis Star Tribune, Septem-
27:591-622. ber 5, p. 13A.
Bickman, Leonard. 1994. "An Optimistic View Braskamp, L. A. and R. D. Brown, eds. 1980.
of Evaluation." Evaluation Practice 15(3): Utilization of Evaluative Information (New
255-59. Directions for Program Evaluation, vol. 5).
, ed. 1990. Advances in Program Theory San Francisco: Jossey-Bass.
(New Directions for Program Evaluation, Breul, Jonathan P. 1994. "How the Govern-
No. 47, Fall). San Francisco: Jossey-Bass. ment Performance and Results Act Borrows
. 1985. "Improving Established State- from the Experience of OECD Countries."
wide Programs: A Component Theory of Paper prepared for the Fulbright Sympo-
sium on Public Sector Reform, July 22-24, Public Problems in a Shared-Power World.
Brisbane, Australia. San Francisco: Jossey-Bass.
Brightman, Harvey and Carl Noble. 1979. "On Buck, Connie. 1995. "The World According to
the Ineffective Education of Decision Scien- Soros." The New Yorker, January 23,
tists." Decision Sciences 10:151-57. pp. 54-78.
Brizius, J. A. and M. D. Campbell. 1991. Get- Bunge, Mario. 1959. Causality. Cambridge,
ting Results: A Guide for Government Ac- MA: Harvard University Press.
countability. Washington, DC: Council of Burry, James. 1984. Synthesis of the Evaluation
Governor's Policy Advisors. Use Literature, NIE Grant Report. Los
Brock, James, Richard Schwaller, and R. L. Angeles: UCLA Center for the Study of
Smith. 1985. "The Social and Local Govern- Evaluation.
ment Impacts of the Abandonment of the Campbell, Donald T. [1971] 1991. "Methods
Milwaukee Railroad in Montana." Evalu- for the Experimenting Society." Evaluation
ation Review 9(2): 127-43. Practice 12(3):223-60. Reprint of 1971
Brookfield, Stephen D. 1994. "Tales From the presentation to the American Psychological
Dark Side: A Phenomenography of Adult Association.
Critical Reflection." International Journal . 1988. Methodology and Epistemology
of Lifelong Learning 13(3):203-16. for Social Science: Selected Papers, edited by
. 1990. Understanding and Facilitating E. S. Overman. Chicago: University of Chi-
Adult Learning. San Francisco: Jossey-Bass. cago Press.
Broom, Michael F. and Donald C. Klein. 1995. . 1983. "Threats to Validity Added
Power: The Infinite Game. Amherst, MA: When Applied Social Research Is Packaged
HRD Press. as 'Program Evaluation' in the Service of
Broskowski, A., J. Driscoll, and H. C. Schul- Administrative Decision Making." Pre-
berg. 1978. "A Management Information sented at the Conference on Family Support
and Planning System for Indirect Ser- Programs: The State of the Art, Sponsored
vices." Pp. 189-214 mEvaluation of Human by the Bush Center in Child Development
Service Programs, edited by C. Clifford and Social Policy, Yale University, New Ha-
Attkisson et al. New York: Academic Press. ven, CT.
Brown, Lawrence A. 1981. Innovation Diffu- . 1969. "Reforms as Experiments."
sion. London: Methuen. American Psychologist 24:409-29.
Bruyn, Severyn. 1966. The Human Perspective Campbell, Donald T. and Robert F. Boruch.
in Sociology: The Methodology of Partici- 1975. "Making the Case for Randomized
pant Observation. Englewood Cliffs, NJ: Assignment to Treatments by Considering
Prentice Hall. the Alternatives: Six Ways in Which Quasi-
Bryk, Anthony S., ed. 1983. Stakeholder-Based Experimental Evaluations in Compensatory
Evaluation (New Directions for Program Education Tend to Underestimate Effects."
Evaluation, vol. 17). San Francisco: Jossey- Pp. 195-296 mEvaluation and Experiment,
Bass. edited by Carol A. Bennett and Arthur A.
Bryson, John M. 1995. Strategic Planning for Lumsdaine. New York: Academic Press.
Public and Nonprofit Organizations. San Campbell, Donald T. and Julian C. Stanley.
Francisco: Jossey-Bass. 1963. Experimental and Quasi-Experimen-
Bryson, John M. and Barbara C. Crosby. 1992. tal Designs for Research. Chicago: Rand
Leadership for the Common Good: Tackling McNally.
References • 391
Campbell, Jeanne L. 1994. "Issues of Cluster Shadish, D. L. Newman, M. A. Scheirer, and

Evaluation Use." Presented at the 1994 C. Wye. San Francisco: Jossey-Bass.
meeting of the American Evaluation Asso- . 1992. "Expanding Evaluation Capa-
ciation, Boston. bilities in the General Accounting Office."
. 1983. "Factors and Conditions Influ- Pp. 91-96 in Evaluation in the Federal Gov-
encing Usefulness of Planning, Evaluation, ernment: Changes, Trends, and Opportuni-
and Reporting in Schools." Ph.D. disserta- ties (New Directions for Program Evalua-
tion, University of Minnesota. tion, No. 55), edited by C. G. Wye and
Canadian Evaluation Society. 1982. The Bot- R. C. Sonnichsen. San Francisco: Jossey-
tom Line: Utilization of'What, by 'Whom? Bass.
Proceedings of the 3rd Annual Conference . 1987a. "The Politics of Program Evalu-
of the Canadian Evaluation Society. To- ation." Pp. 5-22 in Evaluation Practice in
ronto: University of Toronto. Review (New Directions for Program Evalu-
ation, No. 34), edited by D. S. Cordray,
Caracelli, Valerie and Hallie Preskill. 1996.
H. S. Bloom, and R. J. Light. San Francisco:
"Evaluation Use Survey." Evaluation Use
Jossey-Bass.
Topical Interest Group, American Evalu-
ation Association. . 1987b. "What We Have Learned
About the Politics of Program Evaluation."
Caro, Francis G., ed. 1971. Readings in Evalu-
Educational Evaluation and Policy Analysis
ation Research. New York: Russell Sage.
9:199-213.
Carver, John. 1990. Boards That Make a Dif-
. 1983. "Improving the Cost Effective-
ference. San Francisco: Jossey-Bass.
ness of Evaluation." Pp. 149-70 in The Costs
Caulley, Darrel. 1993. "Evaluation: Does It of Evaluation, edited by Marvin C. Alkin
Make a Difference?" Evaluation Journal of and Lewis C. Solmon. Beverly Hills, CA:
Australia 5(2):3-15. Sage.
CFC (Center for the Future of Children). 1995. Chen, Huey-Tsyh. 1990. Theory-Driven Evalu-
"Long-Term Outcomes of Early Childhood ations. Newbury Park, CA: Sage.
Programs." In The Future of Children 5(3). , ed. 1989. "Special Issue: The Theory-
Los Altos, CA: The David and Lucille Driven Perspective." Evaluation and Pro-
Packard Foundation. gram Planning 12(4).
Chelimsky, Eleanor. 1997. "The Coming Chen, Huey-Tsyh and Peter Rossi. 1989. "Is-
Transformations in Evaluation." In Evalu- sues in the Theory-Driven Perspective."
ation for the 21st Century, edited by Eleanor Evaluation and Program Planning 12(4):
Chelimsky and Will Shadish. Thousand 299-306.
Oaks, CA: Sage. . 1987. "The Theory-Driven Approach
. 1995a. "The Political Environment of to Validity." Evaluation and Program Plan-
Evaluation and What It Means for the De- ning 10(1):95-103.
velopment of the Field." Presented as the Cicarelli, Victor. 1971. "The Impact of Head
American Evaluation Association Presiden- Start: Executive Summary." Pp. 397-401 in
tial Address, November, Vancouver. Pub- Readings in Evaluation Research, edited by
lished in Evaluation Practice 16(3):215-25. Francis G. Caro. New York: Russell Sage.
. 1995b. "Comments on the AEA Guid- Cochran-Smith, Marilyn and Susan Lytle.
ing Principles." Pp. 53-54 in Guiding Prin- 1990. "Research on Teaching and Teacher
ciples for Evaluators (New Directions for Pro- Research: The Issues That Divide." Educa-
gram Evaluation, No. 66), edited by W. R. tional Researcher 19(2):2-11.
Coffey, Amanda and Paul Atkinson. 1996. Conrad, Kendon J., ed. 1994. Critically Evalu-
Making Sense of Qualitative Data. Thou- ating the Role of Experiments (New Direc-
sand Oaks, CA: Sage. tions for Program Evaluation, No. 63). San
Cohen, David K. 1970. "Politics and Research: Francisco: Jossey-Bass.
Evaluation of Social Action Programs in Conte, Christopher. 1996. "Workfare on
Education." In Educational Evaluation, Trial." Governing, April, pp. 19-23.
American Educational Research Associa- Cook, T. D. and Donald T. Campbell. 1979.
tion, Review of Educational Research Quasi-experimentation: Design and Analy-
(April):213-38. sis Issues for Field Settings. Chicago: Rand
Cohen, David K. and Michael S. Garet. 1975. McNally.
"Reforming Educational Policy With Ap- Cook, Thomas D. 1995. "Evaluation Lessons
plied Social Research." Harvard Educa- Learned." Plenary keynote address at the
tional Review 45 (February): 17-41. International Evaluation Conference, "Eval-
uation '95," November 4, Vancouver, B.C.
Cohen, David K. and Janet A. Weiss. 1977.
Cooley, William W. and William E. Bickel.
"Social Science and Social Policy: Schools
1985. Decision-Oriented Educational Re-
and Race." Pp. 67-84 in Using Social Re-
search. Boston: Kluwer-Nijhoff.
search in Public Policy Making, edited by
Cordray, David S. 1993. "Strengthening Causal
Carol H. Weiss. Lexington, MA: D. C.
Interpretations of Nonexperimental Data:
Heath.
The Role of Meta-analysis." Pp. 59-97 in
Cole, M. B. 1984. "User-Focused Evaluation of
Program Evaluation: A Pluralistic Enterprise
Training Programme Effectiveness in a
(New Directions for Program Evaluation,
South African Industrial Company." Pre-
No. 60), edited by Lee Sechrest. San Fran-
sented at the National Productivity Institute
cisco: Jossey-Bass.
Conference, University of Witwatersrand,
Corwin, Ronald G. 1973. Reform and Orga-
Johannesburg.
nizational Survival. New York: Wiley Inter-
Combs, Arthur. 1972. Educational Account- science.
ability: Beyond Behavioral Objectives.
Council on Foundations. 1993. Evaluation for
Washington, DC: Association for Supervi-
Foundations: Concepts, Cases, Guidelines,
sion and Curriculum Development.
and Resources. San Francisco: Jossey-Bass.
Comptroller General of Canada. 1989. "Work- Cousins, J. Bradley, John J. Donohue, and Gor-
ing Standards for the Evaluation of Pro- don Bloom. 1996. "Understanding Collabo-
grams in Federal Departments and Agen- rative Evaluation: Results From a Survey
cies." Ottawa, ON: Program Evaluation of North American Evaluators." Unpub-
Branch, Supply 8t Services Canada. lished paper submitted for publication, Uni-
Connell, J. P., A. C. Kubisch, L. B. Schorr, and versity of Ottawa. Inquiries: bcousins®
C. H. Weiss (Eds.). 1995. New Approaches educl .edu.uottawa.ca.
to Evaluating Community Initiatives: Con- . 1995. "Collaborative Evaluation: Sur-
cepts, Methods, and Contexts. Washington, vey of Practice in North America." Un-
DC: The Aspen Institute. published paper presented at the Inter-
Connor, Ross F. 1988. "Structuring Knowl- national Evaluation Conference, Vancou-
edge Production Activities to Facilitate ver. Monograph inquiries: bcousins®
Knowledge Utilization: Thoughts on Im- educl.edu.uottawa.ca.
portant Utilization Issues." Studies in Edu- Cousins, J. Bradley and Lorna M. Earl, eds.
cational Evaluation 14:273-83. 1995. Participatory Evaluation in Educa-
References • 393
tiort: Studies in Evaluation Use and Orga- Debate: New Perspectives (New Directions
nizational Learning. London: Falmer. for Program Evaluation, No. 61), edited by
Cousins, J. Bradley and Lorna M. Earl. 1992. C. S. Reichardt and S. F. Rallis. San Fran-
"The Case for Participatory Evaluation." cisco: Jossey-Bass.
Educational Evaluation and Policy Analysis Davidson, Fred. 1996. Principles of Statistical
14:397-418. Data Handling. Thousand Oaks, CA: Sage.
Cousins, J. Bradley and K. A. Leithwood. 1986. Davis, Howard R. and Susan E. Salasin. 1975.
"Current Empirical Research on Evaluation "The Utilization of Evaluation." Pp. 621-66
Utilization." Review of Educational Re- in Handbook of Evaluation Research, vol. 1,
search 56(3):331-64. edited by Elmer L. Struening and Marcia
Cranford, John. 1995. "A Guide to Award- Guttentag. Beverly Hills, CA: Sage.
Winning Technology." Governing, January, Dawson, Gary. 1995. "Agency Evaluation Re-
pp. 61-70. ports Disregarded by Legislators Who Had
Cronbach, Lee J. 1982. Designing Evaluations Requested Them." Saint Paul Pioneer Press,
of Educational and Social Programs. San August 7, p. 4B.
Francisco: Jossey-Bass. Dawson, Judith A. and J. J. D'Amico. 1985.
. 1975. "Beyond the Two Disciplines of "Involving Program Staff in Evaluation
Scientific Psychology." American Psycholo- Studies: A Strategy for Increasing Use and
gist 30:116-17. Enriching the Data Base." Evaluation Re-
Cronbach, Lee J. and Associates. 1980. Toward view 9 (2) :173 -88.
Reform of Program Evaluation. San Fran- Deitchman, Seymour. 1976. The Best-Laid
cisco: Jossey-Bass. Schemes: A Tale of Social Research and Bu-
Cronbach, Lee J. and P. Suppes, eds. 1969. reaucracy. Cambridge: MIT Press.
Research for Tomorrow's Schools: Disci- Denzin, Norman K. and Yvonna S. Lincoln.
plined Inquiry of Education. New York: 1994. Handbook of Qualitative Research.
Macmillan. Thousand Oaks, CA: Sage.
Crozier, Michel. 1964. The Bureaucratic Phe- Dery, D. 1981. Computers in Welfare: The
nomenon. Chicago: University of Chicago MIS-Match. Beverly Hills, CA: Sage.
Press. Deutscher, Irwin. 1970. "Words and Deeds:
Cyert, Richard and James G. March. 1963. A Social Science and Social Policy." Pp. 27-51
Behavioral Theory of the Firm. Englewood in Qualitative Methodology, edited by
Cliffs, NJ: Prentice Hall. William J. Filstead. Chicago: Markham.
Dahl, Robert. 1957. "The Concept of Power." Dewey, John. 1956a. The Child and the Cur-
Behavioral Science 2(July):201-15. riculum. Chicago: University of Chicago
Dalkey, N. C. 1969. The Delphi Method: An Press.
Experimental Study of Group Opinion. . 1956b. The School and Society. Chi-
Santa Monica, CA: Rand. cago: University of Chicago Press.
Daniels, Stacey. 1996. "Process or Outcomes? de Wilde, John C. 1967. Experiences With Ag-
Different Approaches for Different Stages." ricultural Development in Tropical Africa.
Foundation, March/April, pp. 46-48. Baltimore, MD: Johns Hopkins University
D'Aprix, Roger D. 1996. Communicating for Press.
Change. San Francisco: Jossey-Bass. Dial, Micah. 1994. "The Misuse of Evaluation
Datta, Lois-ellin. 1994. "Paradigm Wars: A in Educational Programs." Pp. 61-68 in Pre-
Basis for Peaceful Coexistence and Beyond." venting the Misuse of Evaluation (New Di-
Pp. 53-70 in The Qualitative-Quantitative rections for Program Evaluation, No. 64),
edited by C. J. Stevens and Micah Dial. San Arthur Lumsdaine. New York: Academic
Francisco: Jossey-Bass. Press.
Dickey, Barbara. 1981. "Utilization of Evalu- Edwards, Ward, Marcia Guttentag, and Kurt
ation of Small-Scale Educational Projects." Snapper. 1975. "A Decision-Theoretic Ap-
Educational Evaluation and Policy Analysis proach to Evaluation Research." Pp. 139-82
2(6):65-77. in Handbook of Evaluation Research, vol. 1,
Dickey, Barbara and Eber Hampton. 1981. edited by Elmer L. Struening and Marcia
"Effective Problem-Solving for Evaluation Guttentag. Beverly Hills, CA: Sage.
Utilization." Knowledge: Creation, Diffu- Eisner, Elliot. 1991. The Enlightened Eye:
sion, Utilization 2(3):361-74. Qualitative Inquiry and the Enhancement of
Donmoyer, Robert. 1996. "Educational Re- Educational Practice. New York: Macmil-
search in an Era of Paradigm Proliferation: lan.
What's a Journal Editor to Do?" Educa- Elpers, J. R. and R. L. Chapman. 1978. "Basis
tional Researcher 25(2): 19-25. of the Information System Design and Im-
Drucker, Peter F. 1996. The Leader of the plementation Process." Pp. 173-88 in
Future. San Francisco: Jossey-Bass. Evaluation of Human Service Programs,
Duffy, Barbara Poitras. 1994. "Use and Abuse edited by C. Clifford Attkisson, W. A.
of Internal Evaluation." Pp. 25-32 in Pre- Hargreaves, M. J. Horowitz, and J. E.
venting the Misuse of Evaluation (New Di- Sorenson. New York: Academic Press.
rections for Program Evaluation, No. 64), Emery, F. W. and E. L. Trist. 1965. "The
edited by C. J. Stevens and Micah Dial. San Causal Texture of Organizational Environ-
Francisco: Jossey-Bass. ment." Human Relations 18(February):
Dugan, Margret. 1996. "Participatory and Em- 21-31.
powerment Evaluation: Lessons Learned in Etheredge, Lloyd S. 1980. "Government
Training and Technical Assistance." Pp. 277- Learning: An Overview." In Handbook of
303 in Empowerment Evaluation: Knowl- Political Behavior, edited by S. Long. New
edge and Tools for Self-Assessment and Ac- York: Plenum.
countability, edited by D. M. Fetterman, Etzioni, Amitai. 1968. The Active Society: A
A. J. Kaftarian, and A. Wandersman. New- Theory of Societal and Political Processes.
bury Park, CA: Sage. New York: Free Press.
Dunagin, Ralph. 1977. Dunagin's People. Sen- Evaluation Research Society. 1980. Standards
tinel Star, Field Newspaper Syndicate (Au- for Evaluation. Washington, DC: Evalu-
gust 30). ation Research Society.
Dyer, Henry S. 1973. "Recycling the Problems Evans, Gerry and Roger Blunden. 1984. "A
in Testing." Assessment in a Pluralistic Soci- Collaborative Approach to Evaluation."
ety: Proceedings of the 1972 Invitational journal of Practical Approaches to Develop-
Conference on Testing Problems, Educa- mental Handicaps 8(1): 14-18.
tional Testing Service, Princeton, NJ. Evans, John W. 1971. "Head Start: Comments
Edison, Thomas. 1983. The Diary and Obser- on Criticisms." Pp. 401-407 in Readings in
vations. New York: Philosophical Library. Evaluation Research, edited by Francis G.
Edwards, Ward and Marcia Guttentag. 1975. Caro. New York: Russell Sage.
"Experiments and Evaluation: A Reexami- Feiman, Sharon. 1977. "Evaluation Teacher
nation." Pp. 409-63 in Evaluation and Ex- Centers." Social Review 8(May):395-411.
periment: Some Critical Issues in Assessing Fetterman, D. M., A. J. Kaftarian, and A. Wan-
Social Programs, edited by Carl Bennet and dersman, eds. 1996. Empowerment Evalu-
References • 395
ation: Knowledge and Tools for Self-Assess- Fournier, Deborah M., ed. 1995. Reasoning in
ment and Accountability. Newbury Park, Evaluation: Inferential Links and Leaps
CA: Sage. (New Directions for Program Evaluation,
Fetterman, David M. 1995. "In Response to vol. 68). San Francisco: Jossey-Bass.
Dr. Dan Stufflebeam." Evaluation Practice Freeman, Howard E. 1977. "The Present
16(2):179-99. Status of Evaluation Research." Pp. 17-51
. 1994a. "Empowerment Evaluation," in Evaluation Studies Review Annual, vol. 2,
American Evaluation Association Presiden- edited by Marcia Guttentag. Beverly Hills,
tial Address. Evaluation Practice, 15(1): CA: Sage.
1-15. Funnell, Sue. 1993. "Reporting the Perform-
. 1994b. "Steps of Empowerment Evalu- ance of the Public Sector." Evaluation Jour-
ation: From California to Cape Town." nal of Australia 5(2): 16-37.
Evaluation and Program Planning 17(3): Gardiner, Peter C. and Ward Edwards. 1975.
305-13. Measurement for Social Decision Making."
. 1993. "Empowerment Evaluation: Pp. 1-38 in Human Judgment and Decision
Theme for the 1993 Evaluation Meeting." Processes, edited by Martin F. Kaplan and
Evaluation Practice 14(1): 115-17. Steven Schwartz. New York: Academic
. 1984. "Ethnography in Educational Press.
Research: The Dynamics of Diffusion." Pp. General Accounting Office (GAO). 1996. Sci-
21-35 in Ethnography in Educational Evalu- entific Research: Continued Vigilance
ation, edited by D. M. Fetterman. Beverly Needed to Protect Human Subjects, GAO/
Hills, CA: Sage. HEHS-96-72. Washington, DC: GAO.
. 1980. "Ethnographic Approaches in . 1995. Program Evaluation: Improving
Educational Evaluation: An Illustration." the Flow of Information to the Congress,
Journal of Thought 15 (3): 31 -4 8. GAO/PEMD-95-1. Washington, DC: GAO.
Fink, Arlene, ed. 1995. The Survey Kit. Thou- . 1992a. Program Evaluation Issues,
sand Oaks, CA: Sage. GAO/OCG-93-6TR. Washington, DC:
Firestone, W. A. and R. E. Herriott. 1984. GAO.
"Multisite Qualitative Policy Research: . 1992b. Adolescent Drug Use Preven-
Some Design and Implementation Issues." tion: Common Features of Promising Com-
Pp. 63-88 in Ethnography in Educational munity Programs, GAO/PEMD-92-2.
Evaluation, edited by D. M. Fetterman. Washington, DC: GAO.
Beverly Hills, CA: Sage. . 1992c. The Evaluation Synthesis,
Fishman, Daniel B. 1992. "Postmodernism GAO/PEMD-10.1.2. Washington, DC:
Comes to Program Evaluation." Evaluation GAO.
and Program Planning 15(2):263-70. . 1992d. Quantitative Data Analysis,
Fletcher, Joseph. 1966. Situation Ethics: The GAO/PEMD-10.1.11. Washington, DC:
New Morality. London: Westminister John GAO.
Knox. . 1991. Designing Evaluations, GAO/
Folz, David H. 1996. Survey Research for Pub- PEMD-10.1.4. Washington, DC: GAO.
lic Administration. Thousand Oaks, CA: . 1990a. Case Study Evaluation, Trans-
Sage. fer Paper 10.1.9. Washington, DC: GAO.
Fossum, L. B. 1989. Understanding Organiza- . 1990b. Prospective Evaluation Meth-
tional Change. Los Altos, CA: Crisp. ods: The Prospective Evaluation Synthesis,
Transfer Paper 10.1.10. Washington, DC: Guba, Egon G. 1981. "Investigative Report-
GAO. ing." Pp. 67-86 in Metaphors for Evalu-
. 1988. Program Evaluation Issues, ation, edited by Nick L. Smith. Beverly
GAO/OCG-89-8TR. Washington, DC: Hills, CA: Sage.
GAO. . 1978. "Toward a Methodology of
. 1987. Federal Evaluation: Fewer Naturalistic Inquiry in Educational Evalu-
Units, Reduced Resources, GAO/PEMD-87- ation." Monograph Series 8, UCLA Center
9. Washington, DC: GAO. for the Study of Evaluation.
. 1981. Federal Evaluations. Washing- . 1977. "Overcoming Resistance to
ton, DC: Government Printing Office. Evaluation." Presented at the Second An-
Gephart, William J. 1981. "Watercolor Paint- nual Conference on Evaluation, University
ing." Pp. 247-72 in Metaphors for Evalu- of North Dakota.
ation, edited by Nick L. Smith. Beverly Guba, Egon and Yvonna Lincoln. 1994. "Com-
Hills, CA: Sage. peting Paradigms in Qualitative Research."
Glaser, Edward M., Harold H. Abelson, and Pp. 105-17 in Handbook of Qualitative Re-
KathaleeN. Garrison. 1983. Putting Knowl- search, edited by N. K. Denzin and Y. S.
edge to Use. San Francisco: Jossey-Bass. Lincoln. Thousand Oaks, CA: Sage.
Goodman, Ellen. 1995. "Patients, Doctors, . 1989. Fourth Generation Evaluation.
Hospitals Must End Silence on Journey to Newbury Park, CA: Sage.
Death." Syndicated column distributed by . 1981. Effective Evaluation: Improving
Washington Post Writers Group, appearing the Usefulness of Evaluation Results
in the Saint Paul Pioneer Press, December 3, Through Responsive and Naturalistic Ap-
p.l7A. proaches. San Francisco: Jossey-Bass.
Gordimer, Nadine. 1994. None to Accompany Guttentag, Marcia and Elmer L. Struening.
Me. New York: Penguin. 1975a. Handbook of Evaluation Research,
Governor's Commission on Crime Prevention Vols. 1 and 2. Beverly Hills, CA: Sage.
and Control (GCCPC). 1976. Residential . 1975b. "The Handbook: Its Purpose
Community Corrections Programs in Min- and Organization." Pp. 3-10 in Handbook
nesota: An Evaluation Report. Saint Paul: of Evaluation Research, vol. 2, edited by
State of Minnesota. Marcia Guttentag and Elmer L. Struening.
Grant, Donald L., ed. 1978. Monitoring Ongo- Beverly Hills, CA: Sage.
ing Programs (New Directions for Program Guttmann, David and Marvin B. Sussman, eds.,
Evaluation, vol. 3). San Francisco: Jdssey- 1995. "Exemplary Social Intervention Pro-
Bass. grams for Members and Their Families."
Greene, Jennifer C. 1990. "Technical Quality Special issue of Marriage and Family Review
Versus User Responsiveness in Evaluation 21(1, 2). New York: Haworth Press.
Practice." Evaluation and Program Planning Hage, Jerald and Michael Aiken. 1970. Social
13(3):267-74. Change in Complex Organizations. New
. 1988a. "Communication of Results York: Random House.
and Utilization in Participatory Program Hall, Holly. 1992. "Assessing the Work of a
Evaluation." Evaluation and Program Plan- Whole Foundation." The Chronicle of Phi-
ning 11:341-51. lanthropy, January 14, 9-12.
. 1988b. "Stakeholder Participation and Hampden-Turner, C. 1990. Creating Corpo-
Utilization in Program Evaluation." Evalu- rate Culture. Reading, MA: Addison-
ation Review 12:91-116. Wesley.
References • 397
Handy, C. B. 1993. Understanding Organiza- Hendricks, Michael, and Elisabeth A. Handley.

tions. New York: Oxford University Press. 1990. "Improving the Recommendations
Harper's Statistical Index. 1985. Harper's From Evaluation Studies." Evaluation and
Magazine, April, p. 11. Source: Govern- Program Planning 13:109-17.
ment Accounting Office/General Services Hersey, Paul. 1985. Situational Leader. Char-
Administration. lotte, North Carolina: Center for Leadership.
Havelock, Ronald G. 1980. "Forward." Pp. Hevey, Denise. 1984. "An Exercise in Utiliza-
11-14 in Using Research in Organizations, tion-Focused Evaluation: The Under-Fives
edited by Jack Rothman. Beverly Hills, CA: Coordinators." Preschool Evaluation Proj-
Sage. ect, Bristol University. Unpublished manu-
. 1973. The Change Agent's Guide to script.
Innovation in Education. Englewood Cliffs, HFRP (Harvard Family Research Project).
NJ: Prentice Hall. 1996a. Noteworthy Results-Based Account-
Hedrick, Terry E. 1994. "The Quantitative- ability Publications: An Annotated Bibliog-
Qualitative Debate: Possibilities for Integra- raphy. Cambridge, MA: Harvard Family
tion." Pp. 45-52 in The Qualitative-Quanti- Research Project Publications.
tative Debate: New Perspectives (New . 1996b. State Results-Based Account-
Directions for Program Evaluation, No. 61), ability Efforts. Cambridge, MA: Harvard
edited by C. S. Reichardt and S. F. Rallis. Family Research Project Publications.
San Francisco: Jossey-Bass. Hinton, Barb. 1988. "Audit Tales: Kansas In-
Heilbroner, Robert. 1996. "Dismal Days for trigue." Legislative Program Evaluation So-
the Dismal Science." Forbes, April 22, ciety (LPES) Newsletter, Spring, p. 3.
pp. 65-66. Hoffman, Yoel. 1975. The Sound of One Hand.
Heilman, John G. 1980. "Paradigmatic Choices New York: Basic Books.
in Evaluation Methodology." Evaluation Holzner, Burkart and John H. Marx. 1979.
Review 4(5):693-712. Knowledge Application: The Knowledge Sys-
Helmer, Olaf. 1966. Social Technology. New tem in Society. Boston: Allyn & Bacon.
York: Basic Books. Hoogerwerf, Andries. 1985. "The Anatomy of
Hendricks, M., M. F. Mangano, and W. C. Collective Failure in the Netherlands."
Moran, eds. 1990. Inspectors General: A Pp. 47-60 in Culture and Evaluation, edited
New Force in Evaluation (New Directions by M. Q. Patton. San Francisco: Jossey-Bass.
for Program Evaluation, No. 48). San Fran- Horsch, Karen. 1996. "Results-Based Account-
cisco: Jossey-Bass. ability Systems: Opportunities and Chal-
Hendricks, Michael. 1994. "Making a Splash: lenges." The Evaluation Exchange 2(l):2-3.
Reporting Evaluation Results Effectively." House, Ernest R. 1995. "Principled Evalu-
Pp. 549-75 in Handbook of Practical Pro- ation: A Critique of the AEA Guiding Prin-
gram Evaluation, edited by J. S. Wholey, H. ciples." Pp. 27-34 in Guiding Principles for
P. Hatry, and K. E. Newcomer. San Fran- Evaluators (New Directions for Program
cisco: Jossey-Bass. Evaluation, No. 66), edited by W. R.
. 1984. "Preparing and Using Briefing Shadish, D. L. Newman, M. A. Scheirer, and
Charts." Evaluation News 5(3): 19-20. C. Wye. San Francisco: Jossey-Bass.
. 1982. "Oral Policy Briefings." Pp. 249- . 1994. "Integrating the Qualitative and
58 in Communication Strategies in Evalu- Quantitative." Pp. 13-22 in The Qualita-
ation, edited by Nick L. Smith. Beverly tive-Quantitative Debate: New Perspectives
Hills, CA: Sage. (New Directions for Program Evaluation,
No. 61), edited by C. S. Reichardt and S. F. tional Evaluation and Public Policy, A Con-
Rallis. San Francisco: Jossey-Bass. ference. San Francisco: Far West Regional
. 1993. Professional Evaluation: Social Laboratory for Educational Research and
Impact and Political Consequences. New- Development.
bury Park, CA: Sage. ICMA. 1995. Applying Performance Measure-
. 1991. "Realism in Research." Educa- ment: A Multimedia Training Program, CD-
tional Researcher 20(6):2-9. ROM. Junction, MD: International City/
. 1990a. "Trends in Evaluation." Educa- County Management Association (in
tional Researcher, 19(3):24-28. conjunction with the Urban Institute, Public
. 1990b. "Methodology and Justice." Technology, Inc., and American Society for
Pp. 23-36 in Evaluation and Social Justice: Public Administration).
Issues in Public Education (New Directions Independent Sector. 1993. A Vision of Evalu-
for Program Evaluation, No. 45), edited by ation, edited by Sandra Trice Gray. Wash-
K. A. Sirotnik. San Francisco: Jossey-Bass. ington, DC: Independent Sector.
. 1986. "In-House Reflection: Internal IQREC. 1997. "Democratizing Inquiry Through
Evaluation." Evaluation Practice 7(1):63- Qualitative Research." Presented at the In-
64. ternational Qualitative Research in Educa-
. 1980. Evaluating With Validity. Bev- tion Conference, University of Georgia,
erly Hills, CA: Sage. Athens.
. 1977. "The Logic of Evaluative Argu- Jacobs, Francine H. 1988. "The Five-Tiered
ment." In CSE Monograph Lines in Evalu- Approach to Evaluation." Pp. 37-68 in
ation, vol. 7. Los Angeles: UCLA Center for Evaluating Family Programs, edited by H. B.
the Study of Education. Weiss and F. Jacobs. Hawthorne, NY:
. 1972. "The Conscience of Educational Aldine.
Evaluation." Teachers College Record 73(3):
Janowitz, Morris. 1979. "Where Is the Cutting
405-14.
Edge of Sociology?" Midwest Sociological
Howe, K. 1988. "Against the Quantitative-
Quarterly 20:591-93.
Qualitative Incompatibility Thesis." Educa-
Johnson, R. B. 1995. "Estimating an Evalua-
tional Researcher 17(8): 10-16.
tion Utilization Model Using Conjoint Mea-
Huberman, Michael. 1995. "Research Utiliza-
surement and Analysis." Evaluation Review
tion: The State of the Art." Knowledge and
19(3):313-38.
Policy 7(4):13-33.
Joint Committee on Standards for Educational
Huberty, Carl J. 1988. "Another Perspective
Evaluation. 1994. The Program Evaluation
on the Role of an Internal Evaluator." Eval-
Standards. Thousand Oaks, CA: Sage.
uation Practice 9(4):25-32.
. 1981. Standards for Evaluations of
Hudson, Joe. 1977. "Problems of Measure-
Educational Programs, Projects, and Materi-
ment in Criminal Justice." Pp. 73-100
als. New York: McGraw-Hill.
Evaluation Research Methods, edited by
Leonard Rutman. Beverly Hills, CA: Sage. Kanter, Rosabeth Moss. 1983. The Change
Hudson, Joe, John Mayne, and R. Thomlison, Masters. New York: Simon & Schuster.
eds. 1992. Action-Oriented Evaluation in Kanter, Rosabeth Moss, B. A. Stein, and J. D.
Organizations: Canadian Practices. To- Jick. 1992. The Challenge of Organizational
ronto: Wall and Emerson. Change. New York: Free Press.
Hurty, Kathleen. 1976. "Report by the Kearns, Kevin P. 1996. Managing for Account-
Women's Caucus." Proceedings: Educa- ability. San Francisco: Jossey-Bass.
References • 399
Kellogg Foundation, n.d. (circa 1995). W. K. . 1994b. "The Future of Collaborative

Kellogg Foundation Cluster Evaluation Action Research: Promises, Problems, and
Model of Evolving Practices. Battle Creek, Prospects." Unpublished paper, College of
MI: Kellogg Foundation. Education, University of Minnesota, Min-
Kennedy, M. M. 1983. "The Role of the In- neapolis, based on a presentation at the
House Evaluator." Evaluation Review Annual Meeting of the American Educa-
7(4):519-41. tional Research Association, Atlanta, 1993.
Kennedy School of Government. 1995. "Inno- King, Jean A., Lynn Lyons Morris, and Carol
vations in America Government Awards T. Fitz-Gibbon. 1987. How to Assess Pro-
Winners." Governing, November, pp. 27- gram Implementation. Newbury Park, CA:
42. Sage.
Kidder, Louise H. and Michelle Fine. 1987. King, Jean A. and Ellen Pechman. 1984. "Pin-
"Qualitative and Quantitative Methods: ning a Wave to Shore: Conceptualizing
When Stories Converge." Pp. 57-76 in Mul- School Evaluation Use. Educational Evalu-
tiple Methods in Program Evaluation (New ation and Policy Analysis 6 (3): 241-51.
Directions for Program Evaluation, No. 35),
. 1982. Improving Evaluation Use in
edited by M. M. Mark and L. Shotland. San
Local Schools. Washington, DC: National
Francisco: Jossey-Bass.
Institute of Education.
King, Jean A. 1995. "Involving Practitioners in
Knapp, Kay. 1995. "Institutionalizing Per-
Evaluation Studies: How Viable Is Collabo-
formance Measurement and Evaluation in
rative Evaluation in Schools." Pp. 86-102 in
Government: Lessons Learned." Presented
Participatory Evaluation in Education:
at the International Evaluation Conference,
Studies in Evaluation Use and Organiza-
November 3, Vancouver. Internal publica-
tional Learning, edited by J. Bradley Cou-
tion of Performance Measurement and
sins and Lorna Earl. London: Falmer.
Evaluation, A-2303 Hennepin County Gov-
. 1988. "Research on Evaluation Use
ernment Center, Minneapolis, Minnesota,
and Its Implications for the Improvement of
55487-0233.
Evaluation Research and Practice." Studies
in Educational Evaluation 14:285-99. Knapp, Michael S. 1996. "Methodological Is-
sues in Evaluating Integrated Human Ser-
. 1985. "Existing Research on Evalu-
vices Initiatives." Pp. 21-34 in Evaluating
ation Use and Its Implications for the Im-
Initiatives to Integrate Human Services
provement of Evaluation Research and
(New Directions for Evaluation, No. 69),
Practice." Presented at invited conference
edited by J. M. Marquart and E. L. Konrad.
on evaluation use, UCLA Center for the
Study of Evaluation. Kneller, George F. 1972. "Goal-Free Evalu-
. 1982. "Studying the Local Use of ation." Evaluation Comment: The Journal
Evaluation: A Discussion of Theoretical Is- of Educational Evaluation (Center for the
sues and an Empirical Study." Studies in Study of Evaluation, UCLA) 3(4): 13-15.
Educational Evaluation 8:175-83. Knowles, Malcolm S. 1989. The Making of an
King, Jean A. and M. Peg Lonnquist. 1994a. "A Adult Educator: An Autobiographical Jour-
Review of Writing on Action Research: ney. San Francisco: Jossey-Bass.
1944-Present." Unpublished paper, Center Knowles, Malcolm S. and Associates. 1985. An-
for Applied Research and Educational Im- dragogy in Action: Applying Modern Princi-
provement, University of Minnesota, Min- ples of Adult Learning. San Francisco:
neapolis. Jossey-Bass.
Knox, Alan B. 1987. Helping Adults Learn. San Le Guin, Ursula K. 1969. The Left Hand of
Francisco: Jossey-Bass. Darkness. New York: Ace Books.
Kochen, Manfred. 1975. "Applications of Leonard, Jennifer. 1996. "Process or Out-
Fuzzy Sets in Psychology." Pp. 395-407 in comes? Turn Outcome 'Sticks' Into Car-
Fuzzy Sets and Their Applications to Cogni- rots." Foundation, March/April, pp. 46-48.
tive and Decision Processes, edited by Lofti Lester, James P. and Leah J. Wilds. 1990. "The
A. Zadeh, King-Sun Fu, Kokichi Tanaka, Utilization of Public Policy Analysis: A Con-
and Masamichi Shimura. New York: Aca- ceptual Framework." Evaluation and Pro-
demic Press. gram Planning 13(3):313-19.
Kottler, Jeffrey A. 1996. Beyond Blame: Re- Levin, B. 1993. "Collaborative Research in and
solving Conflicts. San Francisco: Jossey- With Organizations." Qualitative Studies in
Bass. Education 6(4):331-40.
Kourilsky, Marilyn. 1974. "An Adversary Levine, Harold G., R. Gallimore, T. S. Weisner,
Model for Educational Evaluation." Evalu- and J. L. Turner. 1980. "Teaching Partici-
ation Comment 4:2. pant-Observation Research Methods: A
Kouzes, James M. and Barry Z. Posner. 1995. Skills-Building Approach." Anthropology
The Leadership Challenge. San Francisco: and Education Quarterly 9(l):38-54.
Jossey-Bass. Leviton, L. A. and E. F. X. Hughes. 1981.
Kuhn, Thomas. 1970. The Structure of Scien- "Research on Utilization of Evaluations: A
tific Revolutions. Chicago: University of Review and Synthesis." Evaluation Review
Chicago Press. 5(4):525-48.
Kushner, Tony. 1994. Angels in America. Part Lewy, Arieh and Marvin Alkin. 1983. The Im-
Two: Perestroika. New York: Theatre Com- pact of a Major National Evaluation Study:
munications Group. Israel's Van Leer Report. Los Angeles: UCLA
Laundergan, J. Clark. 1983. Easy Does It. Cen- Center for the Study of Evaluation.
ter City, MN: Hazelden Foundation. Lincoln, Yvonna S. 1991. "The Arts and Sci-
Law, Nancy. 1996. "VP News." Reality-Test, ences of Program Evaluation." Evaluation
The Division H Newsletter of the American Practice 12(1): 1-7.
Educational Research Association, January, Lincoln, Yvonna S. and Egon G. Guba. 1985.
p. 1. Naturalistic Inquiry. Beverly Hills, CA:
Lawler, E. E., Ill, A. M. Mohrman, Jr., S. A. Sage.
Mohrman, G. E. Ledford, Jr., T. G. Cum- Lindblom, Charles E. 1965. The Intelligence of
mings, and associates. 1985. Doing Research Democracy. New York: Free Press.
That Is Useful for Theory and Practice. San . 1959. "The Science of Muddling
Francisco: Jossey-Bass. Through Public Administration." Public Ad-
Layzer, Jean I. 1996. "Building Theories of ministration Review 19:79-99.
Change in Family Support Programs." The Lipsey, Mark W. 1990. Design Sensitivity: Sta-
Evaluation Exchange 2(1): 10-11. tistical Power for Experimental Research.
Lazarsfeld, Paul F. and Jeffrey G. Reitz. 1975. Newbury Park, CA: Sage.
An Introduction to Applied Sociology. New . 1988. "Practice and Malpractice in
York: Elsevier. Evaluation Research." Evaluation Practice
Leeuw F., R. Rist, and R. Sonnichsen, eds. 9(4):5-24.
(1993). Comparative Perspectives on Evalu- Lipsey, Mark W. and John A. Pollard. 1989.
ation and Organizational Learning. New "Driving Toward Theory in Program Evalu-
Brunswick, NJ: Transaction. ation: More Models to Choose From." In
References • 401
"Special Issue: The Theory-Driven Perspec- Marquart, Jules M. and Ellen L. Konrad, eds.
tive," edited by Huey-Tsyh Chen, Evalu- 1996. Evaluating Initiatives to Integrate
ation and Program Planning 12(4):317-28. Human Services (New Directions for Pro-
Lofland, John. 1971. Analyzing Social Settings. gram Evaluation No. 69).
Belmont, CA: Wadsworth. Massarik, F., ed. 1990. Advances in Organiza-
Love, Arnold J. 1991. Internal Evaluation: tion Development. Norwood, NJ: Ablex.
Building Organizations From Within. New- Maxwell, Joseph A. 1996. Qualitative Research
bury Park, CA: Sage. Design. Thousand Oaks, CA: Sage.
, ed. 1983. Developing Effective Internal Maxwell, J. A., P. G. Bashook, and L. J.
Evaluation (New Directions for Program Sandlow. 1985. "Combining Ethnographic
Evaluation, No. 20). San Francisco: Jossey- and Experimental Methods in Educational
Bass. Research: A Case Study." In Beyond the
Lucas, H. C. 1975. Why Information Systems Status Quo: Theory, Politics, and Practice in
Fail. New York: Columbia University Press. Ethnographic Evaluation, edited by D. M.
Lynn, Lawrence E., Jr. 1980a. "Crafting Policy Fetterman and M. A. Pitman. Washington,
Analysis for Decision Makers." Interview DC: Cato Institute.
conducted by Michael Kirst in Educational Mayer, Steven E. 1996. "Building Community
Evaluation and Policy Analysis 2:85-90. Capacity With Evaluation Activities That
. 1980b. Designing Public Policies: A Empower." Pp. 332-75 in Empowerment
Casework on the Role of Policy Analysis. Evaluation: Knowledge and Tools for Self-
Santa Monica, CA: Goodyear. Assessment and Accountability, edited by
Lynn, Lawrence E., Jr. and Susan Salasin. 1974. D. M. Fetterman, A. J. Kaftarian, and
"Human Services: Should We, Can We A. Wandersman. Newbury Park, CA: Sage.
Make Them Available to Everyone?" Evalu- . 1994. Building Community Capacity:
ation (Spring Special Issue):4-5. The Potential of Community Foundations.
Lyon, Eleanor. 1989. "In-House Research: A Minneapolis, MN: Rainbow Research, Inc.
Consideration of Roles and Advantages." . 1993. "Common Barriers to Effective-
Evaluation and Program Planning 12(3): ness in the Independent Sector." Pp. 7-11 in
241-48. A Vision of Evaluation. Washington, DC:
MacKenzie, R. A. 1972. The Time Trap. New Independent Sector.
York: AMACOM. . 1976. Organizational Readiness to Ac-
Mann, Floyd C. and F. W. Neff. 1961. Manag- cept Program Evaluation Questionnaire.
ing Major Change in Organizations. Ann Minneapolis, MN: Program Evaluation Re-
Arbor, MI: Foundation for Research on Hu- source Center.
man Behavior. . 1975. "Are You Ready to Accept Pro-
Mark, Melvin M. and Thomas D. Cook. 1984. gram Evaluation" and "Assess Your Pro-
"Design of Randomized Experiments and gram Readiness for Program Evaluation."
Quasi-Experiments." Pp. 65-120 in Evalu- Program Evaluation Resource Center News-
ation Research Methods, edited by Leonard letter 6(1): 1-5 and 6(3):4-5. Published by
Rutman. Beverly Hills, CA: Sage. Program Evaluation Resource Center, Min-
Mark, Melvin M. and Lance Shotland, eds. neapolis, MN.
1987. Multiple Methods in Program Evalu- . n.d. The Assets Model of Community
ation (New Directions for Program Evalu- Development. Minneapolis, MN: Rainbow
ation, No. 35). San Francisco: Jossey-Bass. Research.
Mcintosh, Winsome. 1996. "Process or Out- Miles, Matthew B. and A. Michael Huberman.
comes? Keep the Context Long-Term." 1994. Qualitative Data Analysis: An Ex-
Foundation, March/April, pp. 46-48. panded Sourcebook, 2nd ed. Thousand
Mclntyre, Ken. 1976. "Evaluating Educational Oaks, CA: Sage.
Programs." Review (University Council for Miller, D. E. 1981. The Book of jargon. New
Educational Administration) 18(1): 39. York: Macmillan.
McLaughlin, John A., Larry J. Weber, Robert Millett, Ricardo A. 1996. "Empowerment
W. Covert, and Robert B. Ingle, eds. 1988. Evaluation and the W. K. Kellogg Founda-
Evaluation Utilization (New Directions for tion." Pp. 65-76 in Empowerment Evalu-
Program Evaluation, No. 39). San Fran- ation: Knowledge and Tools for Self-Assess-
cisco: Jossey-Bass. ment and Accountability, edited by D. M.
McLaughlin, Milbrey. 1976. "Implementation Fetterman, A. J. Kaftarian, and A. Wanders-
as Mutual Adaptation." Pp. 167-80 in Social man. Newbury Park, CA: Sage.
Program Implementation, edited by Walter Mills, C. Wright. 1961. The Sociological Imagi-
Williams and Richard F. Elmore. New York: nation. New York: Grove.
Academic Press. . 1959. The Sociological Imagination.
McLean, A. J. 1982. Organizational Develop- New York: Oxford University Press.
ment in Transition: An Evolving Profession. Minnich, Elizabeth K. 1990. Transforming
New York: John Wiley. Knowledge. Philadelphia: Temple Univer-
McTavish, Donald, E. Brent, J. Cleary, and sity Press.
K. R. Knudsen. 1975. The Systematic Assess- Moe, Barbara L. 1993. "The Human Side of
ment and Prediction of Research Methodol- Evaluation: Using the Results." Pp. 19-31 in
ogy, Vol. 1, Advisory Report. Final Report A Vision of Evaluation. Washington, DC:
on Grant OEO 005-P-20-2-74, Minnesota Independent Sector.
Continuing Program for the Assessment and Morgan, Gareth. 1989. Creative Organiza-
Improvement of Research. Minneapolis: tional Theory. Newbury Park, CA: Sage.
University of Minnesota. . 1986. Images of Organization. New-
MDHS. 1996. Focus on Client Outcomes: A bury Park, CA: Sage.
Guidebook for Results-Oriented Human Morris, Lynn Lyons and Carol Taylor Fitz-Gib-
Services. St. Paul, MN: Community Services bon. 1978. How to Deal With Goals and
Division, Minnesota Department of Human Objectives. Beverly Hills, CA: Sage.
Services. Morrison, Ann M. 1995. The New Leaders:
MECFE. 1992. Changing Times, Changing Leadership Diversity in America. San Fran-
Families. Parent outcome evaluation of the cisco: Jossey-Bass.
Minnesota Early Childhood Parent Educa- Moss, Pamela. 1996. "Enlarging the Dialogue
tion Program. St. Paul, MN: Minnesota De- in Educational Measurement: Voices From
partment of Education. the Interpretive Research Traditions." Edu-
Mendelow, Aubrey L. 1987. "Stakeholder cational Researcher 25(l):20-28.
Analysis for Strategic Planning and Imple- Mowbray, Carol T. 1994. "The Gradual Ex-
mentation." Pp. 176-91 in Strategic Plan- tinction of Evaluation Within a Govern-
ning and Management Handbook, edited by ment Agency." Pp. 33-48 in Preventing the
W. R. King and D. I. Cleland. New York: Misuse of Evaluation (New Directions for
Van Nostrand Reinhold. Program Evaluation, No. 64), edited by
Meyers, William R. 1981. The Evaluation En- C. J. Stevens and Micah Dial. San Francisco:
terprise. San Francisco: Jossey-Bass. Jossey-Bass.
References • 403
Mueller, Marsha. 1996. Immediate Outcomes Program Evaluation, vol. 36). San Fran-
of Lower-Income Participants in Minne- cisco: Jossey-Bass.
sota's Universal Access Early Childhood Nunnally, Jim C , Jr. 1970. Introduction to
Family Education. St. Paul, MN: Minnesota Psychological Measurement. New York:
Department of Children, Families & Learn- McGraw-Hill.
ing. Odiorne, George S. 1984. Strategic Manage-
Murphy, Jerome T. 1976. "Title V of ESEA: ment of Human Resources. San Francisco:
The Impact of Discretionary Funds on State Jossey-Bass.
Education Bureaucracies." Pp. 77-100 in Office of Program Analysis, General Account-
Social Program Implementation, edited by ing Office. 1976. Federal Program Evalu-
Walter Williams and Richard Elmore. New ations: A Directory for the Congress. Wash-
York: Academic Press. ington, DC: Government Printing Office.
Nagao, Masafumi. 1995. "Evaluating Global Osborne, David and Ted Gaebler. 1992. Rein-
Issues in a Community Setting." Keynote venting Government: How the Entre-
address, Evaluation '95, International Eval- preneurial Spirit Is Transforming the Public
uation Conference, November 3, Vancouver. Sector From Schoolhouse to Statehouse, City
Hall to the Pentagon. Reading, MA: Addison-
Nagel, Ernest. 1961. The Structure of Science.
Wesley.
New York: Harcourt Brace Jovanovich.
O'Toole, James O. 1995. Leading Change. San
National Academy of Sciences. 1968. The Be-
havioral Sciences and the Federal Govern-
Owen, John M. 1993. Program Evaluation:
ment. Washington, DC: Government Print-
Forms and Approaches. New South Wales,
ing Office.
Australia: Allen &C Unwin.
Newcomer, Kathryn E. and Joseph S. Wholey.
Owens, Thomas. 1973. "Education Evaluation
1989. "Conclusion: Evaluation Strategies
by Adversary Proceeding." In School Evalu-
for Building High-Performance Programs."
ation: The Politics and Process, edited by
Pp. 195-208 in Improving Government Per-
Ernest R. House. Berkeley, CA:
formance: Evaluation Strategies for Strength-
McCutchan.
ening Public Agencies and Programs, edited
PACT. 1986. Participatory Evaluation: A
by J. S. Wholey and K. E. Newcomer. San
User's Guide. New York: Private Agencies
Collaborating Together.
Newman, Dianna and Robert Brown. 1996. Palumbo, Dennis J., ed. 1994. The Politics of
Applied Ethics for Program Evaluation. Program Evaluation. Newbury Park, CA:
Newbury Park, CA: Sage. Sage.
New York Times. 1996. "Educators Show How Palumbo, D. J., S. Maynard-Moody, and P.
Not to Write English." Editorial distributed Wright. 1984. "Measuring Degrees of Suc-
by New York Times News Service and pub- cessful Implementation." Evaluation Re-
lished in the Minneapolis Star Tribune, view 8(l):45-74.
March 24, p. A24. Palumbo, D. J., M. Musheno, and S. Maynard-
Northwest Regional Educational Laboratory Moody. 1985. An Evaluation of the Imple-
(NWREL). 1977. 3-on-2 Evaluation Report, mentation of Community Corrections in
1976-1977, vols. 1-3. Portland, OR: Oregon, Colorado and Connecticut, Final
NWREL. Report prepared for Grant 82-15-CUK015.
Nowakowski, Jeri, ed. 1987. The Client Per- Washington, DC: National Institute of
spective on Evaluation (New Directions for Justice.
404 • U T I L I Z A T I O N - F O C U S E D EVALUATION
Parker, Glenn M. 1996. Team Players and . 1983. "Similarities of Extension and
Teamwork. San Francisco: Jossey-Bass. Evaluation." Journal of Extension 21(Sep-
Parlett, Malcolm and David Hamilton. 1976. tember-October): 14-21.
"Evaluation as Illumination: A New Ap- . 1982a. Practical Evaluation. Beverly
proach to the Study of Innovatory Pro- Hills, CA: Sage.
grams." Pp. 140-57 in Evaluation Studies . 1982b. "Managing Management In-
Review Annual, vol. 1, edited by Gene V. formation Systems." Pp. 227-39 in Practical
Glass. Beverly Hills, CA: Sage. Evaluation, edited by M. Q. Patton. Beverly
. "Evaluation as Illumination: A New Hills, CA: Sage.
Approach to the Study of Innovative Pro-
. 1981. Creative Evaluation. Beverly
grams." Occasional Paper 9, University of
Hills, CA: Sage.
Edinburgh Center for Research in the Edu-
. 1980a. Qualitative Evaluation Meth-
cational Sciences.
ods. Beverly Hills, CA: Sage.
Parsons, Talcott. 1960. Structure and Process
. 1980b. The Processes and Outcomes of
in Modern Society. New York: Free Press.
Chemical Dependency. Center City, MN:
Patton, Michael Quinn. 1996. A World Larger
Hazelden Foundation.
Than Formative and Summative (New Di-
. 1978. Utilization-Focused Evaluation.
rections in Program Evaluation). San Fran-
Beverly Hills, CA: Sage.
cisco: Jossey-Bass.
. 1994a. "Developmental Evaluation." . 1975a. Alternative Evaluation Re-
Evaluation Practice 15(3):311-20. search Paradigm. Grand Forks: University of
. 1994b. "The Program Evaluation North Dakota.
Standards Reviewed." Evaluation Practice . 1975b. "Understanding the Gobble-
15(2):193-99. dy Gook: A People's Guide to Standardized
. 1990. Qualitative Evaluation and Re- Test Results and Statistics." In Testing and
search Methods. Newbury Park, CA: Sage. Evaluation: New Views. Washington, DC-
. 1989. "A Context and Boundaries for Association for Childhood Education Inter-
a Theory-Driven Approach to Validity." national.
Evaluation and Program Planning 12(4): . 1973. Structure and Diffusion of Open
375-78. Education, Report on the Trainers of
. 1988. "Integrating Evaluation Into a Teacher Trainer Program, New School of
Program for Increased Utility and Cost- Behavioral Studies in Education. Grand
Effectiveness." Pp. 85-95 in Evaluation Forks: University of North Dakota.
Utilization (New Directions in Program Patton, Michael Q. with M. Bringewatt,
Evaluation, No. 39), edited by Robert J. Campbell, T. Dewar, and M. Mueller.
Covert et al. San Francisco: Jossey-Bass. 1993. The McKnight Foundation Aid to
. 1986. Utilization-Focused Evaluation Families in Poverty Initiative: A Synthesis of
2nd ed. Beverly Hills, CA: Sage. Themes, Patterns, and Lessons Learned. Min-
, ed. 1985. Culture and Evaluation. San neapolis, MN: The McKnight Foundation.
Francisco: Jossey-Bass. Patton, Michael Q., Patricia S. Grimes, Kathryn
. 1984. "An Alternative Evaluation Ap- M. Guthrie, Nancy J. Brennan, Barbara D.
proach for the Problem-Solving Training French, and Dale A. Blyth. 1977. "In Search
Program: A Utilization-Focused Evaluation of Impact: An Analysis of the Utilization
Process." Evaluation and Program Planning of Federal Health Evaluation Research."
7:189-92. Pp. 141-64 in Using Social Research in Pub-
References • 405
lie Policy Making, edited by Carol H. Weiss. the Western World, vol. 14. Chicago: Ency-
Lexington, MA: D. C. Heath. clopedia Britannica.
Patton, Michael Q., Kathy Guthrie, Steven Policy Analysis Source Book for Social Pro-
Gray, Carl Hearle, Rich Wiseman, and grams. 1976. Washington, DC: Govern-
Neala Yount. 1977. Environments That ment Printing Office.
Make a Difference: An Evaluation of Ramsey Popham, James W. 1995. "An Extinction-
County Corrections Foster Group Homes. Retardation Strategy for Educational Eval-
Minneapolis: Minnesota Center for Social uators." Evaluation Practice 16(3):267-74.
Research, University of Minnesota. . 1972. "Results Rather Than Rhetoric."
Pederson, Clara A., ed. 1977. Informal Educa- Evaluation Comment: The journal of Edu-
tion: Evaluation and Record Keeping. Grand cational Evaluation (Center for the Study of
Forks: University of North Dakota. Evaluation, UCLA) 3(4): 12-13.
Perlman, Ellen. 1996. "Sirens That Repel." Popham, James W. and Dale Carlson. 1977.
Governing, April, pp. 37-42. "Deep Dark Deficits of the Adversary Evalu-
Perrone, Vito. 1977. The Abuses of Stan- ation Model." Educational Researcher,
dardized Testing. Bloomington, IN: Phi June, pp. 3-6.
Delta Kappa Educational Foundation. Posavac, Emil J. 1995. "Statistical Process Con-
Perrone, Vito, Michael Q. Patton, and Barbara trol in the Practice of Program Evaluation."
French. 1976. Does Accountability Count Evaluation Practice 16(3): 121-39.
Without Teacher Support? Minneapolis: . 1994. "Misusing Program Evaluation
Minnesota Center for Social Research, Uni- by Asking the Wrong Questions." Pp. 69-78
versity of Minnesota. in Preventing the Misuse of Evaluation (New
Perrow, Charles. 1970. Organizational Analy- Directions for Program Evaluation, No. 64),
sis: A Sociological View. Belmont, CA: edited by C. J. Stevens and Micah Dial. San
Wadsworth. Francisco: Jossey-Bass.
. 1968. "Organizational Goals." Pp. 305- Powell, Arthur B., Dawud A. Jeffries, and
11 in International Encyclopedia of Social Aleshia E. Selby. 1989. "Participatory Re-
Sciences. New York: Macmillan. search: Empowering Students and Teachers
Peters, Thomas and Robert Waterman. 1982. and Humanizing Mathematics." Humanis-
In Search of Excellence. New York: Harper tic Mathematics Network Newsletter 4:29-
&Row. 38.
Petrie, Hugh G. 1972. "Theories Are Pressman, Jeffrey L. and Aaron Wildavsky.
Tested by Observing the Facts: Or Are 1984. Implementation. Berkeley: University
They?" Pp. 47-73 in Philosophical Redirec- of California Press.
tion of Educational Research: The Seventy- Prideaux, David. 1995. "Beyond Facilitation:
First Yearbook of the National Society for the Action Research as Self-Research and Self-
Study of Education, edited by Lawrence G. Evaluation." Evaluation journal of Australia
Thomas. Chicago: University of Chicago 7(1):3-13.
Press. Pritchett, Price. 1996. Resistance: Moving Be-
Phillips, D. C. 1995. "The Good, the Bad, and yond the Barriers to Change. Dallas, TX:
the Ugly: The Many Faces of Constructiv- Pritchett and Associates.
ism." Educational Researcher 24(7):5-12. Provus, Malcolm. 1971. Discrepancy Evalu-
Plutarch. 1952. "Alexander." The Lives of the ation for Educational Program Improvement
Noble Grecians and Romans, Great Books of and Assessment. Berkeley, CA: McCutchan.
Rafter, David O. 1984. "Three Approaches to Rivlin, Alice M. 1971. Systematic Thinking for
Evaluation Research." Knowledge: Creation, Social Action. Washington, DC: Brookings
Diffusion, Utilization 6(2): 165-85. Institution.
Reichardt, Charles S. and Thomas D. Cook. Rizo, Felipe M. 1991. "The Controversy About
1979. "Beyond Qualitative Versus Quanti- Quantification in Social Research." Educa-
tative Methods." In Qualitative and Quan- tional Researcher 20(9):9-12.
titative Methods, edited by T. Cook and Rog, DebraJ. 1985. "A Methodological Assess-
C. S. Reichardt. Beverly Hills, CA: Sage. ment of Evaluability Assessment." Ph.D. dis-
Reichardt, Charles S. and Sharon F. Rallis, eds. sertation, Vanderbilt University.
1994a. The Qualitative-Quantitative De- Rogers, Everett. 1962. Diffusion of Innovation.
bate: New Perspectives (New Directions for New York: Free Press.
Program Evaluation, No. 61). San Fran-
Rogers, Everett M. and Floyd F. Shoemaker.
cisco: Jossey-Bass.
1971. Communication of Innovation. New
. 1994b. "The Relationship Between the
York: Free Press.
Qualitative and Quantitative Research Tra-
Rogers, Everett M. and Lynne Svenning. 1969.
ditions." Pp. 5-12 in The Qualitative-
Managing Change. San Mateo, CA: Opera-
Quantitative Debate: New Perspectives
tion PEP.
Rosenthal, Elsa J. 1976. "Delphi Technique."
No. 61), edited by C. S. Reichardt and S. F.
Pp. 121-22 in Encyclopedia of Educational
Rallis. San Francisco: Jossey-Bass.
Evaluation, edited by S. Anderson et al. San
. 1994c. "Qualitative and Quantitative
Inquiries Are Not Incompatible: A Call for
Rossi, Peter H. 1994. "The War Between the
a New Partnership." Pp. 85-91 in The
Quals and the Quants: Is a Lasting Peace
Qualitative-Quantitative Debate: New Per-
Possible?" Pp. 23-36 in The Qualitative-
spectives (New Directions for Program
Quantitative Debate: New Perspectives
Evaluation, No. 61), edited by C. S.
Reichardt and S. F. Rallis. San Francisco:
No. 61), edited by C. S. Reichardt and S. F.
Jossey-Bass.
Rallis. San Francisco: Jossey-Bass.
Reicken, Henry W. and Robert F. Boruch.
1974. Social Experimentation: AMethod for . 1972. "Testing for Success and Failure
Planning and Evaluating Social Interven- in Social Action." Pp. 11-65 in Evaluating
tion. New York: Academic Press. Social Programs, edited by Peter H. Rossi
and Walter Williams. New York: Seminar
Resnick, Michael. 1984. "Teen Sex: How Girls
Press.
Decide." Update-Research Briefs (University
of Minnesota) 11 (5): 15. Rossi, Peter H. and H. E. Freeman. 1993.
Richter, M. J. 1995. "Managing Government's (1985. 1982.) Evaluation: A Systematic Ap-
Documents." Governing, April, pp. 59-66. proach. Beverly Hills, CA: Sage.
Rippey, R. M. 1973. "The Nature of Transac- Rossi, Peter H., Howard E. Freeman, and Sonia
tional Evaluation." Pp. 1-16 in Studies in R. Wright. 1979. Evaluation: A Systematic
Transactional Evaluation, edited by R. M. Approach. Beverly Hills, CA: Sage.
Rippey. Berkeley, CA: McCutchan. Rossi, Peter H. and Walter Williams, eds. 1972.
Rist, Raymond. 1977. "On the Relations Evaluating Social Programs: Theory, Prac-
Among Educational Research Paradigms: tice, and Politics. New York: Seminar Press.
From Disdain to Detente." Anthropology Rutman, Leonard. 1977. "Barriers to the Utili-
and Education 8:42-49. zation of Evaluation Research." Presented
References • 407
at the 27th Annual Meeting of the Society proach for Immediate Use." Working paper
for the Study of Social Problems, Chicago. of the Improved Outcomes for Children
Rutman, Leonard and John Mayne. 1985. "In- Project, Washington, DC.
stitutionalization of Program Evaluation in . 1988. Within Our Reach: Breaking the
Canada: The Federal Level." Pp. 61-68 in Cycle of Disadvantage. New York: Dou-
Culture and Evaluation, edited by M. Q. bleday.
Patton. San Francisco: Jossey-Bass. Schutz, Alfred. 1967. The Phenomenology of
Sanders, James. 1994. "Methodological Issues the Social World. Evanston, IL: Washington
in Cluster Evaluation." Presented at the University Press.
1994 meeting of the American Evaluation Schwandt, Thomas A. 1989a. "The Politics of
Association, Boston. Verifying Trustworthiness in Evaluation
Sartorius, Rolf H. 1996a. "Third Generation Auditing." Evaluation Practice 10(4):33-40.
Logical Framework." European journal of
. 1989b. "Recapturing Moral Discourse
Agricultural Education and Extension
in Evaluation." Educational Researcher
March. Unpublished manuscript.
19(8):11-16.
. 1996b. "The Third Generation Logical
Schwandt, T. A. and E. S. Halpern. 1988. Link-
Framework: More Effective Project and
ing Auditing and Metaevaluation. Newbury
Program Management." Working paper,
Park, CA: Sage.
Social IMPACT, Reston, VA. (e-mail: so-
Scriven, Michael. 1996. "Formative and Sum-
cimpct@erols.com)
mative." (draft title) Evaluation Practice.
. 1991. "The Logical Framework Ap-
. 1995. "The Logic of Evaluation and
proach to Project Design and Manage-
Evaluation Practice." Pp. 49-70 in Reason-
ment." Evaluation Practice 12(2): 139-47.
ing in Evaluation: Inferential Links and
Saxe, Leonard and Daniel Koretz, eds. 1982.
Leaps (New Directions for Program Evalu-
Making Evaluation Research Useful to Con-
ation, No. 68), edited by D. M. Fournier.
gress. San Francisco: Jossey-Bass.
San Francisco: Jossey-Bass.
Schalock, Robert L. 1995. Outcome-Based
Evaluation. New York: Plenum. . 1994. "The Final Synthesis." Evalu-
Schein, Edgar H. 1985. Organizational Culture ation Practice 15(3):367-82.
and Leadership. San Francisco: Jossey-Bass. . 1993. Hard-Won Lessons in Program
Schein, L. 1989. A Manager's Guide to Corpo- Evaluation (New Directions for Program
rate Culture. New York: Conference Board. Evaluation, No. 58). San Francisco: Jossey-
Scheirer, Mary Ann. 1987. "Program Theory Bass.
and Implementation Theory: Implications . 1991a. "Beyond Formative and Sum-
for Evaluators." Pp. 59-76 in Using Program mative Evaluation." Pp. 18-64 inEvaluation
Theory in Evaluation (New Directions for and Education: At Quarter Century, 90th
Program Evaluation, vol. 33), edited by Yearbook of the National Society for the
Leonard Bickman. San Francisco: Jossey- Study of Education, edited by M. W.
Bass. McLaughlin and D. C. Phillips. Chicago:
Schon, Donald A. 1987. Educating the Reflec- University of Chicago Press.
tive Practitioner. San Francisco: Jossey-Bass. . 1991b. Evaluation Thesaurus, 4th ed.
. 1983. The Reflective Practitioner. New Newbury Park, CA: Sage.
York: Basic Books. . 1983. "Evaluation Ideologies." Pp.
Schorr, Lisbeth. 1993. "Shifting to Outcome- 229-60 in G. F. Madaus, M. Scriven, and
Based Accountability: A Minimalist Ap- D. L. Stufflebeam, eds., Evaluation Models:
Viewpoints on Educational and Human Ser- gram Evaluation: Theories of Practice. New-
vices Evaluation. Boston: Kluwer-Nijhoff. bury Park, CA: Sage.
. 1980. The Logic of Evaluation. Iver- Shadish, W. R., Jr. and R. Epstein. 1987. "Pat-
ness, CA: Edgepress. terns of Program Evaluation Practice
. 1972a. "Objectivity and Subjectivity in Among Members of the Evaluation Re-
Educational Research." Pp. 94-142 mPhilo- search Society and Evaluation Network."
sophical Redirection of Educational Re- Evaluation Review 11:555-90.
search: The Seventy-First Yearbook of the Shadish, William R., Jr., Dianna L. Newman,
National Society for the Study of Education, Mary Ann Scheirer, and Christopher Wye.
edited by Lawrence G. Thomas. Chicago: 1995. Guiding Principles for Evaluators
University of Chicago Press. (New Directions for Program Evaluation,
. 1972b. "Pros and Cons About Goal- No. 66). San Francisco: Jossey-Bass.
Free Evaluation." Evaluation Comment: Shah, I. 1964. The Sufis. Garden City, NY:
The Journal of Educational Evaluation Doubleday.
(Center for the Study of Evaluation, UCLA) Shapiro, Edna. 1973. "Educational Evaluation:
3 (4): 1-7. Rethinking the Criteria of Competence."
. 1967. "The Methodology of Evalu- School Review, November, pp. 523-49.
ation." Pp. 39-83 in Perspectives of Curricu- Sharp, Colin A. 1994. "What Is Appropriate
lum Evaluation, edited by Ralph W. Tyler Evaluation? Ethics and Standards in Evalu-
et al., AERA Monograph Series on Curricu- ation." Evaluation News and Comment,
lum Evaluation, 1. Chicago: RandMcNally. The Magazine of the Australasian Evalu-
Scriven, Michael and Michael Patton. 1976. "A ation Society, May, pp. 34-41.
Perspective on Evaluation." Videotape in- Sharp, Colin A. and Ann Lindsay. 1992. "An
terview. Minneapolis, MN: Program Evalu- Interim History of Program Evaluation in
ation Resource Center. Australia and New Zealand and the Aus-
Sechrest, Lee. 1992. "Roots: Back to Our First tralasian Evaluation Society." Presented at
Generations." Evaluation Practice 13(1): the International Evaluation Conference of
1-7. the Australasian Evaluation Society, July,
Sechrest, Lee B. and Anne G. Scott, eds. 1993. Melbourne, Australia.
Understanding Causes and Generalizing Sharpe, L. J. 1977. "The Social Scientist and
About Them (New Directions for Program Policymaking: Some Cautionary Thoughts
Evaluation, No. 57). San Francisco: Jossey- and Transatlantic Reflections." Pp. 37-54 in
Bass. Using Social Research for Public Policy Mak-
Senge, Peter M. 1990. The Fifth Disciple: The ing, edited by Carol H. Weiss. Lexington,
Art and Practice of the Learning Organiza- MA: D. C. Heath.
tion. New York: Doubleday. Siegel, Karolynn and Peter Tuckel. 1985. "The
Shadish, William R., Jr. 1987. "Program Mi- Utilization of Evaluation Research: A Case
cro- and Macrotheories: A Guide for Social Analysis." Evaluation Review 9(3):307-28.
Change." Pp. 93-110 in Using Program Sirotnik, Kenneth A., eds. 1990. Evaluation
Theory in Evaluation (New Directions for and Social Justice: Issues in Public Educa-
Program Evaluation, vol. 33), edited by tion (New Directions for Program Evalu-
Leonard Bickman. San Francisco: Jossey- ation, No. 45). San Francisco: Jossey-Bass.
Bass. Smelser, Neil. 1959. Social Change in the In-
Shadish, William R., Jr., Thomas D. Cook, and dustrial Revolution. Chicago: University of
Laura C. Leviton. 1991. Foundations of Pro- Chicago Press.
References • 409
Smith, Doris Shackelford. 1992. "Academic and Organizational Learning, edited by F.

and Staff Attitudes Towards Program Evalu- Leeuw, R. Rist, and R. Sonnichsen. New
ation in Nonformal Educational Systems." Brunswick, NJ: Transaction.
Ph.D. dissertation, University of California, . 1989. "Producing Evaluations That
Berkeley. Make an Impact." Pp. 49-66 in Improving
Smith, John K. 1988. "The Evaluator/ Re- Government Performance: Evaluation Strate-
searcher as Person vs. the Person as Evalu- gies for Strengthening Public Agencies and
ator/Researcher." Educational Researcher Programs, edited by J. S. Wholey and K. E.
17(2):18-23. Newcomer. San Francisco: Jossey-Bass.
Smith, M. F. 1989. Evaluability Assessment. . 1988. "Advocacy Evaluation: A Model
Boston: Kluwer Academic Publishers. for Internal Evaluation Offices." Evaluation
. 1988. "Evaluation Utilization Revis- and Program Planning 11(2): 141-48.
ited." Pp. 7-19 in Evaluation Utilization . 1987. "An Internal Evaluator Re-
(New Directions for Program Evaluation, sponds to Ernest House's Views on Inter-
vol. 39), edited by J. A. McLaughlin, nal Evaluation." Evaluation Practice 8(4):
Larry J. Weber, Robert W. Covert, and 34-36.
Robert B. Ingle. San Francisco: Jossey-Bass. Special Commission on the Social Sciences,
Smith, Mary Lee. 1994. "Qualitative Plus/ National Science Foundation. 1968. Knowl-
Versus Quantitative." Pp. 37-44 in The edge Into Action: Improving the Nation's
Qualitative-Quantitative Debate: New Per- Use of the Social Sciences. Washington, DC:
spectives (New Directions for Program Government Printing Office.
Evaluation, No. 61, Spring), edited by C. S. "Speed." 1995. The New Yorker, March 27,
Reichardt and S. F. Rallis. San Francisco: p. 40.
Jossey-Bass. Stake, Robert E. 1996. "Beyond Responsive
Smith, Nick L., ed. 1992. Varieties of Investiga- Evaluation: Developments in This Decade."
tive Evaluation (New Directions for Pro- Minnesota Evaluation Studies Institute
gram Evaluation, No. 56). San Francisco: presentation, College of Education and Hu-
Jossey-Bass. man Development, University of Minne-
, ed. 1981. Metaphors for Evaluation: sota, June 25.
Sources of New Methods. Beverly Hills, CA: . 1995. The Art of Case Research. New-
Sage. bury Park, CA: Sage.
. 1980. "Studying Evaluation Assump- . 1981. "Case Study Methodology: An
tions." Evaluation Network Newsletter, Epistemological Advocacy." Pp. 31-40 in
Winter, pp. 39-40. Case Study Methodology in Educational
Social Science Research Council, National Evaluation, edited by W. W. Welch. Min-
Academy of Sciences. 1969. The Behavioral neapolis: Minnesota Research and Evalu-
and Social Sciences: Outlook and Need. ation Center.
Englewood Cliffs, NJ: Prentice Hall. . 1978. "Should Educational Evaluation
Sonnichsen, Richard C. 1994. "Evaluators as Be More Objective or More Subjective?"
Change Agents." Pp. 534-48 in Handbook Presented at the annual meeting of the
of Practical Prgram Evaluation, edited by American Educational Research Associa-
J. S. Wholey, H. P. Hatry, and K. E. tion, Toronto.
Newcomer. San Francisco: Jossey-Bass. . 1975. Evaluating the Arts in Educa-
. 1993. "Can Governments Learn?" In tion: A Responsive Approach. Columbus,
Comparative Perspectives on Evaluation OH: Charles E. Merrill.
Stalford, Charles B. 1983. "School Board Use . 1980. "An Interview With Daniel L.
of Evaluation Information." Presented at Stufflebeam." Educational Evaluation and
the joint meeting of the Evaluation Network Policy Analysis 2(4):90-92.
and the Evaluation Research Society, Chi- . 1972. "Should or Can Evaluation Be
cago. Goal-Free?" Evaluation Comment: The
Statewide Study of Education. 1967. Educa- Journal of Educational Evaluation (Center
tional Development for North Dakota, for the Study of Evaluation, UCLA) 3(4):
1967-1975. Grand Forks: University of 7-9.
North Dakota. Stufflebeam, Daniel L., W. J. Foley, W. J.
Stecher, Brian M. and W. Alan Davis. 1987. Gephart, L. R. Hammond, H. O. Merriman,
How to Focus an Evaluation. Newbury and M. M. Provus. 1971. Educational
Evaluation and Decision-Making in Educa-
Park, CA: Sage.
tion. Itasca, IL: Peacock.
Stevens, Carla J. and Micah Dial, eds. 1994.
Stufflebeam, Daniel L. and Egon Guba. 1970.
Preventing the Misuse of Evaluation (New
"Strategies for the Institutionalization of the
Directions for Program Evaluation, No. 64).
CIPP Evaluation Model." Presented at the
1 lth Annual PDK Symposium on Education
Stockdill, S. H., R. M. Duhon-Sells, R. A. Ol-
Research, Columbus, Ohio.
son, and M. Q. Patton. 1992. "Voices in the
Suchman, Edward A. 1972. "Action for
Design and Evaluation of a Multicultural
What? A Critique of Evaluative Research."
Education Program: A Developmental Ap-
Pp. 42-84 in Evaluating Action Programs,
proach." New Directions in Program Evalu-
edited by Carol H. Weiss. Boston: Allyn &
ation, Spring, 53:17-34.
Bacon.
Stone, Clarence N. 1985. "Efficiency Versus . 1967. Evaluative Research: Principles
Social Learning: A Reconsideration of the and Practice in Public Service and Social
Implementation Process." Policy Studies Re- Action Programs. New York: Russell Sage.
view 4(3):484-96. Taylor, Donald W. 1965. "Decision Making
Strike, Kenneth. 1972. "Explaining and Under- and Problem Solving." Pp. 48-86 in Hand-
standing the Impact of Science on Our Con- book of Organizations, edited by James G.
cept of Man." Pp. 26-46 in Philosophical March. Chicago: Rand McNally.
Redirection of Educational Research: The Terry, Robert W. 1993. Authentic Leadership.
Seventy-First Yearbook of the National Soci- San Francisco: Jossey-Bass.
ety for the Study of Education, edited by Thompson, James D. 1967. Organizations in
Lawrence G. Thomas. Chicago: University Action. New York: McGraw-Hill.
of Chicago Press. Thompson, Mark. 1975. Evaluation for Deci-
Studer, Sharon L. 1978. "A Validity Study of a sion in Social Programmes. Lexington, MA:
Measure of 'Readiness to Accept Program D. C. Heath.
Evaluation.' " Ph.D. dissertation, University Thoreau, Henry D. 1838. Journal, March 14.
of Minnesota. Trend, M. G. 1978. "On Reconciliation of
Stufflebeam, Daniel. 1994. "Empowerment Qualitative and Quantitative Analysis." Hu-
Evaluation, Objectivist Evaluation, and man Organization 37:345-54.
Evaluation Standards: Where the Future of Tripodi, Tony, Phillip Felin, and Irwin Epstein.
Evaluation Should Not Go and Where It 1971. Social Program Evaluation Guidelines
Needs to Go." Evaluation Practice 15(3): for Health, Education, and Welfare Admini-
321-38. stration. Itasca, IL: Peacock.
References • 411
Trochim, William M. K., ed. 1986. Advances . 1993a. "What Is Participatory Action
in Quasi-Experimental Design and Analysis Research?" Melbourne, Australia: Action
(New Directions for Program Evaluation, Research Issues Association, Inc.
No. 31). San Francisco: Jossey-Bass. . 1993b. "How Can Professionals Help
Tucker, Eugene. 1977. "The Follow Through Groups Do Their Own Participatory Action
Planned Variation Experiment: What Is the Research?" Melbourne, Australia: Action
Pay-Off?" Presented at the annual meeting Research Issues Association, Inc.
of the American Educational Research Asso- . 1984. Do It Yourself Social Research.
ciation, April. New York City, NY. Melbourne, Australia: Victorian Council of
Social Service and Melbourne Family Care
Turner, Terilyn C. and Stacey H. Stockdill, eds.
Organisation in association with Allen and
1987. The Technology for Literacy Project
Unwin.
Evaluation. The Saint Paul Foundation:
St. Paul, Minnesota. Walters, Jonathon. 1996. "Auditor Power!"
Governing, April, pp. 25-29.
Turpin, Robin. 1989. "Winner of the Presi-
Ward, David, Gene Kassebaum, and Daniel
dent's Prize on the Problem of Evaluation
Wilner. 1971. Prison Treatment and Parole
Politics." Evaluation Practice 10(10):54-57.
Survival: An Empirical Assessment. New
Uphoff, Norman. 1991. "A Field Methodology
York: John Wiley.
for Participatory Self-Evaluation." Special
Wargo, Michael J. 1995. "The Impact of Fed-
issue, Evaluation of Social Development
eral Government Reinvention on Federal
Projects, in Community Development Jour-
Evaluation Activity." Evaluation Practice
nal 26(4):271-85.
16(3):227-37.
U.S. Department of Health and Human Ser-
. 1989. "Characteristics of Successful
vices. 1983. Compendium of HHS Evalu-
Program Evaluations." Pp. 71-82 in Im-
ation Studies. Washington, DC: HHS
proving Government Performance: Evalu-
Evaluation Documentation Center.
ation Strategies for Strengthening Public
U.S. House of Representatives, Committee on Agencies and Programs, edited by J. S.
Government Operations, Research and Wholey and K. E. Newcomer. San Fran-
Technical Programs Subcommittee. 1967. cisco: Jossey-Bass.
The Use of Social Research in Federal Do- Watkins, Karen E. and Victoria J. Marsick.
mestic Programs. Washington, DC: Govern- 1993. Sculpting the Learning Organization.
ment Printing Office. San Francisco: Jossey-Bass.
Vroom, Phyllis I., Marie Columbo, and Neva Weber, Max. 1947. The Theory of Social and
Nahan. 1994. "Confronting Ideology and Economic Organizations. New York: Ox-
Self-interest: Avoiding Misuse of Evalu- ford University Press.
ation." Pp. 61-68 in Preventing the Misuse Weidman, Donald R., Pamela Horst, Grace
of Evaluation (New Directions for Program Taher, and Joseph S. Wholey. 1973. Design
Evaluation, No. 64), edited by C. J. Stevens of an Evaluation System for NIMH, Con-
and Micah Dial. San Francisco: Jossey-Bass. tract Report 962-7. Washington, DC: Ur-
Wadsworth, Yoland. 1995. "'Building In' ban Institute.
Research and Evaluation to Human Ser- Weiss, Carol H. 1993. "Where Politics and
vices." Unpublished report to the Winston Evaluation Research Meet." Evaluation
Churchill Memorial Trust of Australia, Practice 14(1):93-106. (Original work pub-
Melbourne. lished 1973)
. [1988] 1990. "Evaluation for Deci- burg, MD: Westinghouse Learning Corpo-
sions: Is Anybody There? Does Anybody ration.
Care?" Pp. 171-84 in Debates on Evalu- Whitmore, E. 1990. "Focusing on the process
ation, edited by Marvin Alkin. Newbury of evaluation: It's the "How" that counts."
Park, CA: Sage. Reprinted American Evalu- Presented at the American Evaluation Asso-
ation Association keynote address originally ciation annual meeting, Washington, DC.
published in Evaluation Practice 9(1):5-19. . 1988. "Empowerment and Evaluation:
. 1980. "Knowledge Creep and Decision A Case Example." Presented at the Ameri-
Accretion." Knowledge: Creation, Diffu- can Evaluation Association annual meeting,
sion, Utilization l(3):381-404. New Orleans.
. 1977. "Introduction." Pp. 1-22 in Us- Whitmore, E. and P. Kerans. 1988. "Participa-
ing Social Research in Public Policy Making, tion, Empowerment, and Welfare." Cana-
edited by Carol H. Weiss. Lexington, MA: dian Review of Social Policy 22:51-60.
D. C. Heath.
Wholey, Joseph S. 1994. "Assessing the Feasi-
, ed. 1972a. Evaluating Action Pro-
bility and Likely Usefulness of Evaluation."
grams. Boston: Allyn &C Bacon.
Pp. 15-39 in Handbook of Practical Program
. 1972b. Evaluation Research: Methods Evaluation, edited by Joseph S. Wholey,
of Assessing Program Effectiveness. Engle- Harry P. Hatry, and Kathryn E. Newcomer.
wood Cliffs, NJ: Prentice Hall. San Francisco: Jossey-Bass.
. 1972c. "Evaluating Educational and
Wholey, Joseph S., Harry P. Hatry, and
Social Action Programs: A Treeful of
Kathryn E. Newcomer, eds. 1994. Hand-
Owls.' " Pp. 3-27 in Evaluating Action Pro-
book of Practical Program Evaluation. San
grams, edited by Carol H. Weiss. Boston:
Allyn & Bacon.
Wholey, Joseph S., John W. Scanlon, Hugh G.
. 1972d. "Utilization of Evaluation: To-
Duffy, James S. Fukumotu, and Leona M.
ward Comparative Study." Pp. 318-26 in
Vogt. 1970. Federal Evaluation Policy: Ana-
Evaluating Action Programs, edited by Carol
lyzing the Effects of Public Programs. Wash-
H. Weiss. Boston: Allyn & Bacon.
ington, DC: Urban Institute.
Weiss, Carol H. and Michael Bucuvalas. 1980.
Whyte, William F., ed. 1991. Participatory Ac-
"Truth Tests and Utility Tests: Decision
tion Research. Newbury Park, CA: Sage.
Makers' Frame of Reference for Social Sci-
ence Research." American Sociological Re- Wildavsky, A. 1985. "The Self-Evaluating Or-
view 45 (April):302-13. ganization." Pp. 246-65 in Program Evalu-
ation: Patterns and Directions, edited by
Weiss, Heather B. and Jennifer C. Greene.
E. Chelimsky. Washington, DC: American
1992. "An Empowerment Partnership for
Society for Public Administration.
Family Support and Education Programs
and Evaluations." Family Science Review Wildman, Paul. 1995. Action Research Case
5(l,2):145-63. Studies Newsletter 3(1).
Weiss, Heather B. and F. Jacobs, eds. 1988. Willems, E. P. and H. L. Raush. 1969. Natural-
Evaluating Family Programs. Hawthorne, istic Viewpoint in Psychological Research.
NY: Aldine. New York: Holt, Rinehart and Winston.
Westinghouse Learning Corporation. 1969. Williams, David D., ed. 1986. Naturalistic
The Impact of Head Start: An Evaluation of Evaluation (New Directions for Program
the Effects of Head Start on Children's Cog- Evaluation, No. 30). San Francisco: Jossey-
nitive and Affective Development. Bladens- Bass.
References • 413
Williams, H. S., A. Y. Webb, and W. J. Phillips. Worthen, Blaine R. 1994. "Conceptual Chal-
1991. Outcome Funding: A New Approach lenges Confronting Cluster Evaluation."
to Targeted Grantmaking. Rensselaerville, Presented at the 1994 meeting of the Ameri-
NY: The Rensselaerville Institute. can Evaluation Association, Boston.
Williams, Jay. 1976. Everyone Knows What a
Worthen, Blaine R. and James R. Sanders, eds.
Dragon Looks Like. New York: Four Winds
1973. Educational Evaluation: Theory and
Press.
Practice. Worthington, OH: Charles A. Jones.
Williams, Walter. 1976. "Implementation Analy-
Wortman, Paul M. 1995. "An Exemplary
sis and Assessment." Pp. 267-92 in Social Pro-
Evaluation of a Program That Worked: The
gram Implementation, edited by W. Williams
High/Scope Perry Preschool Project. Evalu-
and R. F. Elmore. New York: Academic
ation Practice 16(3):257-65.
Press.
Williams, Walter and Richard F. Elmore. 1976. Wray, L. D. and J. A. Hauer. 1996. "Best
Social Program Implementation. New York: Practices Reviews for Local Government."
Academic Press. Public Management, January, pp. 7-11.
Williams, Walter and John W. Evans. 1969. Wright, William and Thomas Sachse. 1977.
"The Politics of Evaluation: The Case of "Survey of Hawaii Evaluation Users." Pre-
Head Start." Annals of the American Acad- sented at the annual meeting of the Ameri-
emy of Political and Social Science 3 85 (Sep- can Educational Research Association. New
tember):118-32. York City, NY.
Winberg, A. 1991. "Maximizing the Contribu- Wye, Christopher G. and Richard C. Sonnich-
tion of Internal Evaluation Units." Evalu- sen, eds. 1992. Evaluation in the Federal
ation and Program Planning 14:167-72. Government: Changes, Trends, and Oppor-
Wolf, Robert L. 1975. "Trial by Jury: A New tunities (New Directions for Program Evalu-
Evaluation Method." Phi Delta Kappan ation, No. 55). San Francisco: Jossey-Bass.
(November).
Yates, Brian T. 1996. Analyzing Costs, Proce-
Wolf, Robert L. and Barbara Tymitz. 1976.
dures, Processes, and Outcome in Human
"Whatever Happened to the Giant Wom-
Services. Thousand Oaks, CA: Sage.
bat. An Investigation of the Impact of the Ice
Age Mammals and Emergence of Man Ex- Yin, Robert K. 1994. "Evaluation: A Singular
hibit." Mimeograph, National Museum of Craft." Pp. 71-84 in The Qualitative-Quan-
Natural History, Smithsonian Institutes. titative Debate: New Perspectives (New Di-
rections for Program Evaluation, No. 61),
Worley, D. R. 1960. "Amount and Generality
edited by C. S. Reichardt and S. F. Rallis.
of Information-Seeking Behavior in Sequen-
tial Decision Making as Dependent on Level
of Incentive." Pp. 1-11 in Experiments on Zadeh, Lofti A., King-Sun Fu, Kokichi Tanaka,
Decision Making, Technical Report 6, ed- and Masamichi Shimura, eds. 1975. Fuzzy
ited by D. W. Taylor. New Haven, CT: Yale Sets and Their Applications to Cognitive and
University, Department of Industrial Ad- Decision Processes. New York: Academic
ministration and Psychology. Press.
Index
Academic, 25, 53-55, 57, 65, 121-126, 128, Alligators, 202

217, 222, 237, 250, 260, 270, 283, 329, Alternative approaches:
331,346 menu of types, 192-194
Accountability, 12-15, 64-7, 69, 76, 82, 101, American Evaluation Association, 12, 13, 3 3 , 64,
109, 122, 128, 142, 158, 168, 187, 232, 80, 98, 101, 102, 118, 122, 123, 124, 125,
361, 373, 380 259, 266, 282, 2 9 1 , 343, 362, 364
of evaluators, 385 annual meeting themes, 33
example, 341-343 Analysis, 125, 292, 301-338
Accreditation, 56, 76, 142, 192 focusing, 307-310, 379
Accuracy, 17, 2 1 , 143, 232, 236, 249, 250, 328, framework, 307
350,351,380,383 manipulable variables, 325-326
standards, 249 mock, 302-303, 355, 378
Achilles'heel, 145,380-381 paradigm approaches, 299
Action, 24, 32, 70, 8 1 , 126, 135, 199 standards for, 277
action-oriented, 299 task force, 355-356
findings, 336 Appropriateness, 9, 10, 130-131, 137, 180,
knowledge and, 351 181, 189, 205, 212, 247, 249, 250, 267,
political, 347 291-299, 380
recommendations, 307, 325 methods, 275-277, 286, 288, 291-299, 355,
theory-based, 237 379
Action research, 64, 99, 103, 104, 111, 129, 336 problem example, 287-288
example, 368-369 standards for, 277, 333
Active-reactive-adaptive, 134-138, 173, 178, 187, Arranging data, 307-309, 379
205, 212, 225, 243, 359, 380, 382-383 Arrogance, 366, 367
ethics and, 364-365 Art criticism approach, 56
See also Situational responsiveness Arts of evaluation, 123-124, 249, 271
Activist role, 123-126, 384 Assets model, 74, 101
Activities focus, 167, 205, 206, 218 Assumptions, 89, 209, 2 2 1 , 222, 226, 237, 358
Actual vs. ideal, 203-204 affect interpretations, 280
results, 306 futuring, 328
Adaptation of models, 204-205 paradigms compatibility, 296
Adult education example, 108 program, 323
Advocacy-adversary model, 56, 75-77, 332 qualitative, 281
Alexander the Great, 177-178 statistical, 286, 309
Alice in Wonderland, 63 validity assumptions, 225-228, 230-231
415
Attitude change, 234, 235 Central issues focus, 173, 174

Audience, 20, 4 3 , 54, 137, 337, 354, 365, 375, Chain of objectives, 217-218, 226, 232, 237
382 exhibits, 219, 220
Audit, 67, 76, 121-122, 128 Champions of use, 141, 142, 2 9 1 , 361
of evaluations, 351-352 Change, 98, 101-103, 106, 137, 160, 187-189,
Australasian Evaluation Society, 15, 64, 367 201,351,384
Australia, 66, 367, 368-369 two paradigm views, 286-288, 299
Change agent role, 122, 137, 141, 142
Change agent study, 204-205
Balance, 250, 2 6 1 , 264, 280-283, 299, Chemical dependency examples, 97, 143, 150,
310-312,351,359,380 199, 207
data analysis, 310-312 Claims approach, 321-324
feedback, 366 claims matrix, 322-323
methods, 289 Clarity, 88-89, 9 1 , 103, 180, 250
replacing objectivity, 282, 299 eschewed, 105
reporting results, 310-312 to aid analysis, 305
Baseline, 96 Clean air example, 226
Bear hunt, 147-148, 174 Closed-loop feedback process, 57
Behavioral objectives, 56, 188 CIPP model, 206-207
Behavior change, 234, 235, 322 Client outcomes, 154-167, 211
Beliefs about evaluation, 27, 29, 264 Client perspective, 48, 49, 203
Believability, 2 5 1 , 253-255, 297, 379, 380 Closeness to program, 97, 112, 122, 127, 273,
claims, 321-324 274, 283-284, 299, 357
Best practices, 73 co-optation fears, 357, 362, 365
Breadth versus depth, 257 Cluster evaluation, 74, 78, 84, 101, 129, 192
Bull's-eye, 183, 184 Collaboration, 22, 166, 223, 242, 315, 334,
336, 355
experimenting society, 385
Canada, 30-31, 50, 66, 97, 138, 185 fear of co-optation, 357-359, 362, 365
Canadian Evaluation Society, 15, 64 win/win 356-357
Caribbean Project example, 95-96, 296-297 Collaborative evaluation, 97-111, 121, 129,
Cartoons: 136, 192, 333, 336, 367
bear hunting, 148 example, 368-369
bull's-eye, 184 See also Participatory evaluation
goals wars, 152 Commitment, 22-23, 2 5 , 29, 37, 44, 52, 82, 84,
hard hitting evaluation, 196 100, 111, 130, 167, 191, 303, 338, 353,
indecision, 84 354,358,382,383
meet everyone's needs, 135 Communications/communicating, 9, 49, 123,
parachute, 196 153-154, 200
pair o' dimes debate, 298 reporting, 331
research vs. evaluation 24 Community development example, 107
stake-holder, 42 Comparative evaluation, 68, 192, 208-209
termination squad, 11 Comparisons, 203, 209, 232-233, 276-277,
truth vs. action, 24 314-315
Case studies, 64, 92, 2 7 1 , 273-277, 279, 290 accountability, 373
purposeful sampling, 288-289 caveats, 208-211
Causal focus, 192, 222, 237-238 claims-based, 323
causal connections, 225-231, 237-238 menu, 314
conceptual gaps, 225-231 of programs, 314
theory testing, 210-211, 218 qualitative, 290
treatment specification, 207-211 theories of action, 232-233
Causality, 89, 122, 208, 216-217, 218, 232, Competence, 2 1 , 250, 353
251, 258-259 of users, 366
eschewed, 105 Complacency, 29, 115
methods for, 277-279, 286-288 Compliance, 13, 29, 65, 121, 192
theorizing, 237-238 implementation questions, 213
Index • 417
Component evaluation, 207, 211 Culture, 88, 90, 99, 100, 103, 124, 138, 177,
Comprehensive approach, 20, 73, 78, 143, 183, 212, 271
205 Cynicism, 13, 29, 384
Computer systems use, 205-206
Conceptual use, 70-71, 74, 82, 382
Conclusion-oriented research, 24 Data pool, 338
Conclusions, 89, 250 Debate, 64, 65, 72, 123-126, 137, 184, 218,
framework for, 307 243, 248, 260
Concrete thinking, 88-89 competing paradigms, 273, 299
Conflict, 123-126, 131, 150-152, 2 0 1 , 202, 347 methods paradigms, 265-299
Conflicting purposes, 65 passions of, 272
Conflicts of interest, 362 paradigms compatibility, 295-297
See also Ethics; Standards Decisions, 12, 23, 24, 46, 55, 67, 70, 79-85,
Congress, 9, 47, 122, 150, 259, 351 130, 135, 140, 174, 378-379
See also Legislative use; Politics decision context, 83
Connoisseurship approach, 56, 192 decision focus 83, 184-185, 192
Constructive uses, 23 decision-oriented inquiry, 24, 49, 5 5 , 69, 144
Context, 123, 126, 130, 152, 192, 206, 249, errors in, 199, 280
264, 285, 288, 294, 299, 382 facilitating, 2 1 , 83
context-free, 289, 292 flow chart of, 378-379
societal, 366 function of executive, 260
when comparing, 290 incremental, 55, 79, 82, 227, 261
Context factors, 47, 130
methods, 241-299
Contingency thinking, 117-145
political, 347
Controversies, 110-113, 122-126, 131, 136,
questions about, 83
181-184, 242-244, 244-245, 326-327, 356
targets of opportunity, 227,
Corrections examples, 208-209, 210, 229, 287
use, connection to 84-85
Corruption of evaluation, 112, 360, 363
with inadequate info, 260
guarding against, 365-366
Deductive approach 68, 89, 219-224, 273,
Cost-benefit, 22, 76, 90, 110, 131, 143, 192,
279-280, 299
260
limits of, 347
Cost-effectiveness, 93-94, 192
methods, 279-280
of evaluation, 379
theory menu, 224
reporting, 331
Deficit model 74
Costs, 22, 29, 93, 98, 138, 142, 205, 376, 380,
Definitions, 25, 64, 93, 99, 110, 126, 140-142,
383
271,292,312-313
Courage, 366, 369
Creativity, 17, 97, 131, 177, 191, 243, 288, 299, cautions about, 109
380 clarity about, 312, 313
in reporting, 332 developmental evaluation, 105
Credibility, 3 1 , 84, 97, 112, 122, 124, 131, 137, dilemmas exhibit, 313
138, 142-143, 144, 163, 236, 243, 247, effectiveness, 221
249, 250-251, 255, 259, 260-261, 264, evaluation, 23-25, 212
272, 359, 375 internal-external roles, 140-142
and use, 250-251, 264, 351-352, 359 misevaluation, 360
claims, 321-324 misuse, 360
defenders of, 361 naturalistic inquiry, 278
ethics and, 363-368 paradigms, 267-268
evaluator's, 130, 2 5 1 , 264 , 335, 364, 382, personal factor, 44
383 process use, 90
jeopardizing, 357-358 program evaluation, 23
overall evaluation, 2 5 1 , 355 stakeholders' definitions, 34-37, 126
standard for, 250 theory of action, 221
Criterion-focused evaluation, 192 types of evaluation, 192-194
Critical issues focus, 4, 185 units of analysis, 207
Cross-cultural, 36, 131, 212 use, 79-82, 236
utilization-focused evaluation, 20-22, 194, Ends. See Means-ends hierarchy; Outcomes

234, 236 Energy program example, 222-223
Delphi technique, 150-152 Enlightenment, 72, 79, 357
Depth versus breadth, 257 Environmental factors, 180-181, 211
Descriptive focus, 192, 200, 206, 218, 307 Epilepsy program example, 169-172, 218, 219
Desiderata, 115 Equity, 193, 200, 223
Design decisions, 64, 65, 83, 241-299, 379 Error, 193, 200, 255-257, 316
alternatives, 299 design, 288
imperfect, 242, 243, 261 methods thinking, 280
Developmental evaluation, 22, 65, 103-113, Espoused theory, 221-223, 226, 227, 237
121, 180-181, 193, 229, 286-287, 299, See also Theory; Theory of Action
336,382 Ethics, xii, 17, 66-67, 112-113, 123-126, 131,
defined, 105 136, 354, 361-369, 379, 382
Differential evaluation, 187 clientism, 365
Disabilities example, 308 framework, 362
Discrepancy evaluation, 203 methods, 379
Dissemination, 66, 329, 330, 337, 379, 380 questions, 368
distinct from use, 3 3 1 , 337, 379, 380, 383 trenches perspective, 367, 368-369
Distance from a program, 127 See also Honesty; Integrity; Morality;
Diversity, 2 1 , 37, 5 1 , 64, 103, 105, 107, 125-126, Principles; Propriety standards
192, 207, 354, 383 Ethnographic focus, 193
ethics and, 365, 366 Evaluability assessment, 104, 111, 152-153, 193,
methods, 292 200
paradigm dimension, 289-290 Evidence for claims, 321-324
valuing, 356-357 Executive summaries, 329, 3 3 1 , 332
within evaluation, 294 Exhibits:
within paradigms, 292 accountability hierarchy, 236
Downsize, 22 annual meeting themes, 33
Dropout rate, 246 chain of objectives, 219, 220
Dynamic perspective, 286-288 claims matrix, 322-323
client-focused outcomes, 157
contrasting approaches, 377
Early childhood education examples, 119-121, data analysis, 308
200, 207 data for interpretation, 321
Economics, 309 developmental evaluation, 105
Education, 10. 78, 150, 202-203, 220-221, dominant paradigm, 270
227-231, 232-233, 276-277, 287-288, evaluating evaluation, 236
341-343 factor analysis interpretation, 318-319, 320
interview, 371-375 flow chart, 378-379
Effectiveness, 13-15, 23, 84, 188, 192, 232, 374 formative-summative example, 70-71
defined, 221 from the trenches, 368-369
how to ask about, 356, 358-359 goal rankings by importance, 172
levels of, 303-305 goals-implementation linkage, 211
Efficiency, 55, 93, 192 graphics, 330
claim, 322 hierarchy of measures, 220
Effort evaluation, 193, 205, 211 hierarchy of objectives, 230-231
Emotions, 22, 283 internal evaluator roles, 142
Empathy, 2 7 1 , 283 intervention-oriented evaluation, 98
Empirical perspective, 26, 32, 89, 103, involving users in methods, 243
358-359, 367 knowledge-oriented evaluation, 73
Empowerment evaluation, 9 1 , 101-103, leadership functions, 144
110-113, 121, 124, 129, 193, 336, 367 legislative nonuse, 5
objections to, 110-113, 362, 365-366 levels of evidence, 235, 236
See also Collaborative evaluation; logic and values, 89
Participatory evaluation. mapping stakes, 344
Index m 419
means-ends theory, 219, 220 example, 371-377

methods quality, outdated, 270 External validity, 258-259
mission, goals, objectives, 171 Extrapolations, 76, 258-259, 273, 289, 299, 337
moral discourse, 368-369
numbers and percents, 311
outcome examples, 160 Face validity, 253-255, 264
outcomes development, 166 Facilitation, 20, 2 1 , 52, 9 1 , 98-112, 122, 134,
paradigm dimensions, 273, 299 156, 163, 189,221, 222, 225, 229, 233,
paradigm synthesis, 299 237, 243, 355, 358, 383
participatory evaluation, 100 ethics and, 365-366
percents and numbers, 311 flow chart, 378-379
politics and ethics, 368-369 of analysis, 302-338
power of graphics, 330 of judgments, 365-366
presenting data, 308 of recommendations, 326
principles of evaluation, 21 Fairness, 103, 111, 142, 250, 264, 280-283,
questions for intended users, 83 299, 3 5 1 , 354, 359
readiness for evaluation, 27 replace objectivity, 282-283, 299
reality-testing example, 28 standard for, 282, 362
reasons for involving users, 243 Family program examples, 101-102, 107,
reinventing government, 14 119-121, 197
results-oriented leadership, 144 Farming systems examples, 96, 229
service-focused goals, 157 FBI example, 140-141
situational factors, 132-133 Fear, 22, 29, 38, 88, 158, 335-336, 3 5 1 , 353,
stakeholder groups, 53 357-358, 368-369
standards for evaluation, 17 Feasibility:
standards of desirability, 304-305 questions, 213
task force example, 51 standards, 17, 143
theory of action, 230-231, 235, 236 Final report, 60, 324-325
three data presentations, 308 See also Reporting
treatment environments, 318-319, 320, 321 Findings use, 63-86, 93, 97, 99, 100, 382
types of teacher centers, 233 menu of options, 76, 382
use and misuse, 361 See also Negative findings; Positive findings;
users and methods decisions, 243 Reporting
utilization-focused flow chart, 378-379 Flexibility, 17, 2 1 , 73, 129, 131, 184, 206, 252,
utilization-focused outcomes framework, 164 267, 273, 277, 279, 299, 380
utilization-focused outcomes stages, 166 methodological, 277, 279, 291
utilization-focused questions, 32 paradigms compatibility, 296-297
validity assumption linkages, 230-231 Flow chart:
when not political, 352 utilization-focused evaluation, 378-379
writing outcomes, 165 Focus/focusing 29-32, 78-79, 88-93, 149,
See also menus. 184-191,192-194, 212, 213-214, 225-231,
Expectations, 82 237, 247, 257-261, 264, 3 5 1 , 356-357,
Experiential education, 95 382
Experimental designs, 22, 64, 93-94, 202, 203, analysis, 306, 307-310
208, 269-271, 273, 380 claims, 321-324
compared to naturalistic, 277-280 converge, 357
difficult to do, 293 exercise, 190
mandated, 287-288 flow chart, 378-379
testing change, 286-288 interpretation, 321-324
underuse of, 288 manipulable variables, 325-326
See also Paradigms menu of alternatives, 192-194
Experimenting society, 11-12, 384-385 morality, 367-368
Extensiveness focus, 193 on learning, 335-336, 356-357
External evaluation, 64, 96, 133, 138-139, 183, recommendations, 325
188,193,229 reports, 331-332
task force effort, 355-356 personal factor, 174

theory of action, 225-231 prioritizing, 170-173
Follow Through example, 202-203, 276-277, problems with, 179-184
283,285 reified, 179-180
Food stamp example, 122 singularity, 168
Formative evaluation, 22, 67, 68-70, 76, 78-79, understandable, 168
93, 112, 118, 119, 123, 132, 170, 202, unrealistic, 152-153, 179
204, 218, 237, 257, 286, 382 utilization-focused framework, 158-167
exhibit example, 70-71 values about, 274
feedback, 334 war about, 151-152, 181
implementation questions, 213-214 whose goals, 148-149, 182-183
report, 330 God, 1, 25
Fourth generation evaluation, 124 Golden Fleece Award, 66
Framework for analysis, 307 Golden rule, consulting, 362
Fraud, 67 Grand Canyon, 90, 431
Frontier school example, 30-31, 97 Graphics, 1 3 1 , 3 2 9 , 3 3 0
Funders, 22, 29, 37, 38, 42, 44, 50, 54, 57, 68, Great Society, 7, 10-12
71, 92, 97, 98, 118, 128, 154, Group home study example, 210, 317-321
157-158, 182, 188, 267, 324, 354,
375, 376, 382
Golden rule, 362 Halcolm, xiii, 1, 19, 87, 115, 117, 240, 264,
Future orientation, 14, 55, 184-185, 357 290, 339-340
recommendations, 328-329 Hard data, 249-250, 267, 270, 274
Fuzzy goals, 1 5 3 , 1 7 9 , 1 8 1 machismo, 274
Hawaii example, 75-77
Head Start evaluation, 150, 202, 209
GAO, 7, 8, 9, 49, 66, 72, 73, 259, 293, Health education example, 218, 220
311-312,333,363 Historical context, 10-17
Generalizability/generalizations, 24, 67, 72, 76, Holistic perspective, 7 3 , 273, 284-285, 299
89, 122, 128, 207, 237, 258-259, 273, Home nursing example, 229
283,288-289,299,355,380 Honesty, 2 1 , 131, 362
Generating knowledge menu, 76 Human services shortfall, 198
Goal-based evaluation, 193 Humanistic evaluation, 68
Goal-free evaluation, 56, 181-184, 193, 279 Humor:
See also Needs-based evaluation. See Cartoons; Sufi stories
Goals, 23, 40, 4 1 , 56, 64, 65, 89, 104, 105, 118,
147-175, 178-184, 188, 204, 217, 232,
237, 374 Ideal plans, 200-202, 222
central vs. peripheral, 173 Ideal vs. actual, 203-204
clarity of, 168, 274 results, 306
communicating about, 153-154 Ideology and methods, 280
comparing, 232, 314-315 See also Paradigms debate
conflict about, 150-152, 181 Ignoring intended users, 60
exaggerated, 152 Illuminative evaluation, 272
fuzzy, 153, 179, 181 Illusion of evaluation, 374
guidelines for, 167-169 Impact, 22, 2 3 , 193, 200, 234, 235
horse to water metaphor, 157-158 Impartiality, 2 8 1 , 282
implementation connection, 211-212 Impersonal, 25
importance traditionally, 149 Implementation evaluation, 2 3 , 96, 187-188,
levels, 156, 169-170, 217-220, 230-231, 193,195-214,218,287
235, 236 alternative approaches, 205-211
meaningful and useful, 167-169 barriers 201-202
of utilization-focused evaluation, 236, 241 claims, 323
organizational, 179-180 compliance issues, 213
outcomes, 154-157 data example, 304-305
paradox of, 174-175 feasibility issues, 213
Index • 421
formative evaluation, 213-4 flow chart, 378-379

inputs, 193, 204, 205, 206, 208 reports, 331-334
linkage to goals, 211-212, 229 Interdisciplinary, 96, 134, 291
local variations, 203-205 Internal evaluation 13, 64, 65, 121, 123, 133,
menu of questions, 213-214 138-143, 183, 193, 205, 229-230, 262,
national programs, 204-205 360, 362
nonexistent programs, 197 example, 376, 377
summative evaluation, 214 Internal-external combination, 142-143, 183
theory of action, 222, 229 Internal validity, 258-259, 263, 278
treatment specification, 207-211, 222 International, 12, 15, 99, 131, 234, 266
variations in, 205-214 global experimenting society, 384
Improvement 12, 2 3 , 24, 65, 68-70, 76, 84, Interpretation, 89, 212, 238, 259, 307, 315-321
106, 111, 218, 229, 335-336, 378 criteria needed, 306
menu of findings uses, 76 framework, 307
overall evaluation purpose, 65, 76, 79, 84, involving users, 325-321, 379, 380
378,382 making claims, 321-324
purposeful sampling for, 288 political nature of, 347
users want, 366, 375 task force, 355-356
Improvement-oriented evaluation, 68-70, 84,
Intervention-oriented evaluation, 9 1 , 93-97, 98,
112,204,382
111,120-121,193,3823
menu of findings uses, 76
Interviews, 95, 247, 248, 268, 2 7 1 , 275
Incremental decisions, 55, 79, 82, 2 0 1 , 202,
validity issues, 263
261, 376
Involving intended users, 20-23, 48-49, 52, 75,
Indecision, 84-5, 113
88, 90, 100, 126, 127, 139, 163, 164-167,
Independence of evaluators, 97, 109, 112,
205, 211, 220-221, 234, 236, 241-244,
121-122, 128, 138, 139, 273, 283-284,
248, 253-255, 263, 302, 338, 3 5 1 , 380
299,316,323,335
choosing comparisons, 314-315
ethics and, 365-366
ethics of, 363-368
in reporting, 333, 335
example, 317-321
Indicators, 159-162, 168, 194, 2 1 1 , 237, 253,
flow chart, 378-379
284
in analysis, 302-338, 380
corruption of, 360
increasing interest, 306
Individual style, 136-137, 384
Individualized programs, 290 in reporting, 332
Inductive approach, 68, 89, 206, 219, 2 2 1 , 223, methods choices, 297-299
224, 273, 279-280 outcomes process, 166
methods, 279-280 over time, 262-263, 264, 292
theory menu, 224 power rules, 356-357
Information age, 5-6 quality of, 383
Information avoidance, 350 turnover, 380-381
Information users, 50-51 Irony, 29
Innovations assessment, 203, 204-205 Irrelevant, 24, 372-374
Inputs, 13, 89, 193, 204, 205, 206, 208, 218, IRS example, 311-312, 314
225, 233, 234 Issues, 4, 166, 185, 213, 226, 237, 3 5 1 , 353
hierarchy exhibit, 234
Institutional memory, 232
Instrumental use, 70, 79, 367, 380 Jargon, 34, 109, 242
Integrating data collection, 93-98, 111 Journalism, 282
Integrity, 2 1 , 124, 126, 129, 131, 283, 335, 357, Judgment, 12, 22, 2 3 , 65, 79, 84, 89, 97, 103,
358, 362, 364, 367, 383 122, 123, 136, 187, 193, 307,
of users, 366 315-321, 332-333, 378
Intended use by intended users, 20, 22, 23, 39- analysis option, 307
6 1 , 63, 82, 122, 125, 166, 184, 194, 2 1 1 , facilitation of, 2 1 , 365-366, 380
225, 234, 236, 241-242, 263, 264, 299, framework, 307
331,351,380,382 ethics and, 365-366
ethics of, 363-368 menu of findings uses, 76
overall evaluation purpose, 65, 76, 79, 84, See also Cluster evaluation
193,378,382 Levels of evidence, 233-235
task force, 355-356 exhibits, 235, 236
use findings for, 307, 315-321 Literacy program example, 70-71
Judgment-oriented evaluation, 65-68 Local variations, 203-205
menu of findings uses, 76 Logical framework, 193, 234, 237
reports, 330, 332-333 for analysis, 307
Justice 99, 102-103, 122, 129, 194, 367 Logical positivism, 38, 268
ethics and, 363-365 Logic of evaluation, 68, 88-93, 97, 100, 111,
174, 187, 2 0 1 , 227, 232
limits of, 347
Kalamazoo education example, 341-343 making claims, 323
Knowledge, ii, 2 3 , 32, 37, 65, 70-75, 76, 84, methods choices, 277
109-110, 122, 138, 187, 193, 220, 237, nonlinear, 380
378 paradigms compatibility, 295-297
action and, 351 Longitudinal focus, 193
as power, 347, 348-352 Love, ii, 25, 366, 368
knowledge change, 234, 235
menu of findings uses, 76
overall evaluation purpose, 65, 76, 79, 84, Management information systems, 24, 69,
193, 220, 378, 382 205-206
tacit, 252 Mandated evaluations, 29
Knowledge-oriented evaluation, 70-75, 78, 84, Manipulable variables, 325-326
103, 120, 382 Marketing example, 255
menu of findings uses, 76 Means-ends hierarchy, 217-229, 232, 234, 237
report, 330 exhibits, 219, 220, 230-231, 235, 236
See also Lessons learned Measure/measurement, 89, 9 1 , 9 3 , 149, 154,
Koan, 189, 301-302 161, 179, 210, 2 1 1 , 234, 235, 244-246,
251-258, 299
alternative approaches, 299
Labeling caution, 208-211, 212, 232 instruments, 252, 253-255
Language, 25, 34-38, 72, 103, 109, 142, options, 247, 355, 380
153-154, 180, 181, 208-209, 212, reliability, 255-257, 292, 355
270-271, 375 utilization-focused, 242-243
confusion about terms, 237 See also Error; Paradigms debate;
for objectivity, 282 Quantitative data; Reliability;
of participants, 283 Statistics; Validity
politics and, 345-347 Mental health example, 275, 368-369
recommendations, 326 Menu approach, xiii, 2 1 , 22, 52, 64, 65, 84, 85,
technical, 242 90, 100, 103, 126, 136, 189, 357, 367-
Leadership, 66, 74, 78, 103, 106-107, 122, 126, 368,382
143-145 not on the menu, 112
exhibit, 144 Menus:
reality-testing functions, 144 comparisons, 314
results-oriented functions, 143-144 findings use, 76
Learning approach, 335-336, 356-357, 358-359, focus alternatives, 192-194
366, 367 implementation approaches, 213-214
Learning organization, 68, 76, 99-100, 103-110, morality, 367-368
336, 365 personal factor planning, 54
institutional memory, 232 process uses, 111
Legislative use, 4-5, 229 program comparisons, 314
exhibit on nonuse, 5 program theory approaches, 224
See also Congress; Politics questions, 192-194, 213-214
Legitimation, 373, 375 relationship dimensions, 127
Lessons learned, 50, 7 1 , 72-75, 193, 232 reporting, 332-333
implementation questions, 214 role options and implications, 128-129
Index • 423
special situations and skills, 131 Morality, 123-126, 182, 363-365, 366-369
temptations from user focus, 58 as central focus, 367
theory approaches, 224 basic questions, 368
types of evaluation, 192-194 methods and, 268
using findings, 76 money and, 362, 363-364
using logic and processes, 111 moral discourse, 366-368
See also exhibits. trenches perspective, 367, 368-369
Merit, 2 3 , 65, 66, 68, 79, 84, 103, 110, 122, user-focus and, 363-369
136, 188, 237, 250, 307, 330, 332, 365, See also Ethics; Honesty; Integrity;
382 Principles; Propriety standards;
Meta-evaluation, 142, 193, 333, 351-352 Muddling through, 82
standard for, 333 Multiple methods, 141, 266-267, 275, 277,
Metaphors, 33, 34-38, 69, 137, 145, 157-158, 278-279, 290-299, 380
177-178, 189, 249, 357-358 claims evidence, 323
Methodological eclecticism, 272, 2 9 1 , 293, 294 eclecticism, 272, 2 9 1 , 293
Methodological quality, 46, 242, 243, 248-250, example, 376, 377
259, 2 6 1 , 292, 351-352
sampling approaches, 289, 380
strength, 294 Multicultural approach, 93
values affecting, 274-275 Multivocal approach, 93
threats to, 263
Methods decisions, 241-299, 355, 379
alternatives, 299
National program variations, 204-205
competing dimensions, 273
Naturalistic inquiry, 252, 273, 277-280, 380
dynamics of, 261-263
Needs assessment, 187, 1 8 8 , 1 9 3
eclecticism, 272, 291-294
Needs-based evaluation, 181-184, 193
flow chart of, 378-379
See also goal-free evaluation.
paradigms debate, 265-299
Negative findings, 112, 254, 335-336, 338, 353,
task force, 355
359, 360, 366
values affecting, 274-275
arrogance and, 366
Million Man March, 244-245
courage for, 366
Mission level focus, 9 1 , 104, 111, 169-170, 171,
feedback balance, 366
193, 237
Misevaluation, 360 most evaluations, 335
Misuse, 9, 67, 112, 153, 343, 359-361, 384 See also Positive findings
exhibit, 361 Negotiating/negotiation, 2 1 , 25, 66, 82, 92, 103,
Models of evaluation, 22, 56-57, 111, 187, 122, 124, 125, 126, 136, 166, 187, 225,
206-207 237,243,250,257,316
methods-based, 268-273,297, 299 ethics of, 362
Kipling's questions, 298 final report, 324-325
Models of programs, 70, 84, 95, 98, 104, 105, flow chart, 378-379
111, 1 8 7 , 2 0 2 - 2 0 3 , 2 0 7 - 2 1 1 negotiator role, 122
assets model, 74, 101 politics and, 345-347, 356-357
chain of events, 233-234, 235 Neutrality, 20, 282, 283, 333, 343, 344
claims about, 321-324 New School education example, 227-231
comparing, 233 Nonuse, 8-10, 372-374
deficit model, 74 legislative example, 5
labeling cautions, 208-212 Norm-referenced approach, 194
program theory, 208, 210, 221-238 Novice evaluators, 37
treatment specification, 207-211 Norms for outcomes, 162
See also Theory of Action; Theory of the Null findings, 208
Program
Money, 11, 13, 200, 209, 376, 381
ethics and, 361-368 Objectives, 153-154, 169-170, 171, 187, 188,
Golden rule, 362 237, 374
Monitoring, 24, 69, 193, 205-206, 2 1 1 , 218 Chain of, 217, 219, 220, 225, 226,
Monitoring and tailoring, 69, 205-206 230-231, 235, 236
Objectivity, 25, 125, 138, 139, 181, 268, 273, Parachute story, 195
280-283, 299, 357 Paradox, 175-175, 367
fostering, 283 Paradigms debate, 265-299
vs. subjectivity, 280-283, 299 compatibility, 295-297
Observable, 88-89 competing dimensions, 273
Occam's razor, 309 history of, 290-291
Open school examples, 209, 227-232, 272, 296 synthesis, 2 9 1 , 299
exhibit, 377 withered, 290-297
interview, 371-376 Participant observation, 283-284
Options, 247, 355, 378 Participatory evaluation, 9 1 , 97-102, 111, 121,
in recommendations, 325 129, 194, 336, 367
Oral reporting, 64, 329, 332 See also Collaborative evaluation
Organizational development, 90, 9 1 , 103-110, People skills, 52
111, 141, 1 8 0 , 2 2 1 , 3 8 2 accepting of others, 129
See also Developmental evaluation communication skills, 129
Organizational goals, 179-180 conflict resolution skills, 131
power and, 348-350 cross-cultural sensitivity, 131
reified, 179-180 consensus-building skills, 128
Outcomes, 22, 2 3 , 64, 65, 89, 105, 154-167, enabling skills, 129
189, 194, 199-200, 204, 210, 237, 284, feedback, 366
384 from the trenches, 368-369
claims, 321-324 group facilitation skills, 128
clarity of, 159, 167 interpersonal skills, 128, 366
client-focused, 157, 194, 204 staying focused, 131
chemical dependency, 207 team-building skills, 131
committing to, 157-158 See also Arts of evaluation; Collaboration;
comparisons, 314 Facilitation; Involving intended users;
development process, 164-167 Sensitivity; Training
end results, 235 Perception(s), 26, 37, 40-41, 64, 7 1 , 181, 206,
examples, 160, 164 222,243,251,264,280,281
framework, 164 paradigms and, 280, 281
from processes, 223, 225, 238 politics and, 347
hierarchy of, 219, 220, 230-231, 235, 236 Thomas' theorem, 222
horse to water, 157-158 Perfect conditions, 118, 287
indicators, 159-162, 237 Performance reporting, 5, 65, 144, 374
individualized, 290 Performance targets, 162, 164, 304-305
issues, 166 Peripheral issues, 173, 174
mania, 162 Personal factor, 37, 44-61, 174, 204-205, 382
measuring, 159-162, 252-253 defined, 44, 382
norms, 162 ethics and, 364-365
performance targets, 162, 304-305 human element, 201
problematic samples, 154-157 reports and, 331-334
program comparisons, 314 Personnel evaluation, 194
specifying, 159 Persuasive use, 141
stages, 166 Phenomenology, 268, 271
stakeholder process, 164-167 Philanthropy, 25, 26, 74, 92, 104
target groups for, 158-159 Philosophy of evaluation, 20, 22, 38, 55, 101,
theory of action, 219, 220, 222-223, 225, 295
226,230-231,235,236 Planned use, 54, 57, 64, 88, 380
using results of, 163 Planning, 200-201, 229-230, 264, 286
utilization-focused framework, 158-167 planning vs. evaluation, 140
versions of, 165 See also Strategic planning
Outputs, 13, 204, 205, 218, 237, 278 Plutonium law of evaluation, 335
Ownership, 22, 3 1 , 90, 98-103, 111, 164, 316, Poem for U&I, 369
333,357,361,383 Policy analysis, 48, 58-60, 72, 133-134, 198
Index • 425
Political nature of evaluation, 38, 44, 101-103, Process as outcome, 88, 105
137, 170, 153, 267, 292, 341-360, 382 Process evaluation, 12, 22, 64, 157, 194, 203,
ethics and, 363-369 206-207,211
examples, 341-342, 344-346, 373-375, 377 linked to outcomes, 223
fears about, 358-359 Process use, 22, 3 1 , 56, 87-113, 229, 324, 367,
history, 343-344 378, 382
is and ought, 347-348 defined, 90
making claims, 323 menu for, 111
misuse and, 359-361 theory of action, 229
rules for, 356-357 Product evaluation, 194, 206, 255
sources of, 347 Profession of evaluation, 9, 10-12, 18, 32-33, 88-
standard for, 343 90, 110, 122-126, 291-295, 364, 383
task force approach, 352-356, 381 Program development, 90, 9 1 , 95, 103-110, 111
trenches perspective, 367, 368-369 See also Developmental evaluation
use and, 344, 356-359 Proposals, 92
when not political, 352 Propriety standards, 2 1 , 249, 362, 380
See also Pork barrel assessment See also Ethics; Honesty; Integrity; Morality;
Political environment, 130, 162, 179, 2 5 1 , 259, Principles
260, 352 Protect human subjects, 362, 363
Political sophistication, 17, 122, 130, 2 9 1 , 326, Proud, 188, 200
344, 352, 362, 380, 382 See also Readiness for evaluation
Politics, 5, 9, 24, 26, 57, 65, 83, 112, 126, 141, Pseudoevaluative investigation, 316
145, 167, 179, 197, 2 0 1 , 280, 315 Psychology of use, 22
conservatives and liberals, 315 Public issues, 4
See also Congress; Legislative use; Power Public relations, 26, 138, 139, 141
Pork barrel assessment, 26 avoiding, 358-359
Positive findings, 335-336, 338, 359 Public welfare, 21
feedback balance, 366 Purposes of evaluation, 64-65, 69, 75, 78,
general positive bias, 336, 362 121-134, 136, 189, 192-194, 299
See also Negative findings determining, 78, 299, 355, 378
Positive thinking, 335-336 importance of knowing, 249
Poverty program examples, 101-102, 208 menu of alternatives, 192-194, 299
Power, 38, 57, 2 0 1 , 242, 282, 347 menu of findings uses, 76
game, 356-357 menu of process uses, 111
of evaluation, 347, 348-352
of groups, 353
speaking truth to, 362 Qualitative data/methods, 22, 123, 124, 154,
Practical methods, 379, 380 159-162, 252, 273-277, 380
Pragmatism, 2 9 1 , 292, 294 argument, 267
paradigms compatibility, 295-297 compared to quantitative, 273-277
Premises, 20, 381-383 credibility, 272
program, 323 emergence, 271-272
Pretest/Posttest, 93-94, 164, 263, 273, 276-280, Knowles' discovery, 271
286, 299 paradigms debate 265-299
Principles, 2 1 , 24, 32, 3 3 , 89, 92, 95-97, 99, standard for, 277
100, 102, 249, 250, 2 9 1 , 343, 360, 362, utility, 272
364, 379, 383 validity, 252, 292
See also Ethics; Honesty; Integrity; Morality; Quality, 137, 139, 157, 205, 292, 360
Standards claim of, 322
Prioritizing, 3 1 , 42, 64-65, 89, 9 1 , 97, methodological, 130, 248-250, 259, 2 6 1 ,
150-153, 168, 170-173, 189-190, 263,269-271,292,351-352
192-194, 212, 213-214, 257-261, 357 paradigms debate, 265-299
flow chart, 378-379 participation, 383
recommendations, 325 strength, 294
See also Menus users want, 366
Prison example, 225-226 Quality assurance, 194
Quality control, 76 guidelines, 324-326

Quality enhancement, 68, 76 involving users, 380
Quantitative data/methods, 22, 123, 252, manipulable variables, 325-326
273-277, 380 options, 332-333
compared to qualitative, 273-277 placement, 331
early dominance of, 268-271 task force, 255-356
paradigms debate, 265-299 useful, 324-329
standard for, 277 whose, 333
validity, 252, 253-255 Reflection, 95, 102, 271
See also Experimental designs; Measurement Reflective practice, 95, 108
Questions, 29-32, 42-43, 83, 185-186, 191, Reification of goals, 179-180
198,213-214,233 Reinventing government, 14
as evaluation focus, 30-31, 185-186, 378 Reliability, 249, 250, 255-257, 263, 292, 297,
causal, 237-238 355, 380
ethical, 368 Relevance, 24, 32, 49, 52, 72, 97, 130, 205,
focus alternatives, 192-194 236, 242, 249, 250, 264, 297, 299, 350,
foretellers', 185-186 375, 383
for intended users, 83, 378 Reporting, 93, 141, 250, 264, 282, 301-338
generating, 29-31 balanced feedback, 366
implementation evaluation, 213-214
distinct from use, 383
important, 250, 258, 302
drafts, 334
influence on decisions, 83
executive summaries, 329, 3 3 1 , 332
Kipling's, 298
impartial, 282
Koanic, 301-302
ineffective, 373
logical, 111
informal, 329
methods, 247, 379
menu, 329, 332-333
moral, 367-368
report unread, 373
readiness for evaluation, 27
standard for, 282
touchy, 353
utilization-focused, 329-338
utilization-focused criteria, 32, 378
See also Negative findings; Positive findings
universal, 339-340
Reputation focus, 194
vs. hypothesis-testing, 279
ethics and, 364
Research, 23-24, 74, 92, 121, 128, 137, 208,
Racism program example, 188 211,220,270-271
Rapid reconnaissance, 39, 96 basis for claims, 323
Rationality, 6-7, 55, 174, 2 0 1 , 348 cartoon, 24
Readiness for evaluation, 26, 27, 29, 188, 200 ideological foundations, 280
Realistic expectations, 80, 82 politics and, 344-346
Reality-testing, 26-29, 38, 102, 103, 179, 206, quality, 248, 2 6 1 , 263, 269-271, 292
222, 237, 358, 384 rating, 261-262
heart of evaluation, 222 strength, 294
paradigms and, 281-283, 295 See also Paradigms debate
paradigms compatibility, 295-297 Respect, 2 1 , 84, 122, 129, 137-138, 353, 357,
use of modeling example, 222-223 362, 364, 366, 369, 383
Real world conditions, 118 See also People skills
Reasonable estimates, 217 Respectability, 266-267
Recidivism, 208-209, 210, 284, 304, 312, 314 Responsibility, 2 1 , 343, 383
measuring, 252-253 balanced feedback, 366
Recommendations, 50, 138, 141, 234, 236, 307, judgment and, 366
324-329, 332-333, 374 morality and, 363-364
case against, 327 Responsive evaluation, 54, 68, 185, 194, 271
controversy about, 326-327 Responsiveness, 117-145, 247, 296, 302
examples, 374 ethics of, 362-369
framework, 307 in reporting, 331-334
futuring, 328-329 Results-focused, 13, 91-92, 143-144, 158
Index • 427
Rigor, 24, 9 1 , 123, 200, 249, 252, 266, 278, See also Active-reactive-adaptive
280, 292 Situational variables, 131, 132-133, 239-240,
making claims, 3 2 1 , 322-323 249, 382
perception of, 261 Skepticism, 13, 29, 314, 384-385
related to use, 2 9 1 , 297, 383 Skills, 52, 128-129, 131, 136, 337
situational, 250, 267 feedback, 366
Roles, xii, 12, 17, 67, 103-113, 117-145, 358 skills change, 234, 235
academic vs. service, 122-126 teaching analysis, 307
controversies about, 110-113, 122-126 See also People skills
data analysis options, 316-317 Social construction of reality, 222, 281-282
developmental evaluation, 105, 229 See also Reality-testing
ethics and, 361-369 Social indicators, 194, 253
futurist, 328-329 See also Indicators
historian, 329 Social justice. See Justice
internal evaluator, 142, 229-230 Soft data, 249-250, 267, 270, 271
leadership, 144 Specificity, 88-89, 9 1 , 103, 170, 180
menu of options, 128-129, 299 eschewed, 105
qualitative, 274 Square wheels test, 225
reporting options, 316-317, 380 Staff development, 90
special situations, 131 See also Developmental evaluation;
stances, 299 Learning organization
task force, 354-355 Stage models of evaluation, 187-189, 205
technical adviser, 242-243 Stakeholders, 41-43, 48-60, 66, 7 5 , 83, 123,
theory clarifier, 229 145, 247, 248, 254, 283, 2 9 1 , 292, 333,
See also Facilitation; Independence; Purposes 338, 344, 382
Russian example, 199-200 cartoon, 42
diversity of, 5 3 , 382
ethics and, 362-369
Sampling, 242, 247, 255, 273, 288-289, 355, evaluators as, 364, 383
380 fair to all, 283
alternative logics, 288-289 mapping, 343, 344
credibility example, 255 power rules, 356-357
Satisfaction, 234, 235, 304 questions for, 83
Scenarios of use, 302-303, 305, 328-329, 378 selectivity, 364-365
futuring, 328-329 starting point, 378
task force, 355, 356 surprising, avoid, 334
Sciences of evaluation, 123 task force of, 5 1 , 352-356, 381
Self-evaluation, 99, 100, 101, 111 teaching analysis, 307, 315-321
See also Empowerment evaluation temptations from focus, 58
Sensitivity, 37, 52, 103, 130, 131, 138, 178, theory of action with, 222
206, 229, 326, 344, 358, 363, 366, 380, turnover of, 380-381
382 utilization-focused deliberations, 317-321
insensitive evaluators, 366 See also Audience; Intended use by intended
See also People skills users; Involving intended users;
Service orientation, 122-124 Responsive evaluation
Sex, defining abnormal, 313 Standardization:
Sexism program example, 188 paradigm dimension, 289-290
Shared understandings, 91-93, 111, 120, 355, Standardized tests, 78, 256-257, 272, 276-277,
356-357, 382 289-290, 373-375, 377
Simplicity, 65, 88 mandated, 287-288
in presenting data, 307-308, 309-310 Standards, 15-17, 2 1 , 32, 3 3 , 54-55, 66, 143,
Simplifying reality, 65, 232, 242, 281 153, 247, 249, 250, 277, 282, 2 9 1 , 333,
Sincerity, 25-26 343, 3 5 1 , 360, 364, 379, 383
Situational responsiveness, 17, 2 1 , 22, 126-137, ethics and, 362, 363
145, 204-205, 2 4 1 , 264, 267, 352, 359, exhibit, 17
380,382 methods, 277, 291
See also Ethics; Morality; Principles Target group (client-focused), 154-167

Standards of desirability, 303-306 Targeting audiences: See Audiences
making judgments, 307 Targets of opportunity, 227, 228, 229
Statistical thinking, 315 Task force approach, 50, 5 1 , 186, 228, 298,
probabilistic thinking, 316 352-356, 376, 377
Statistics, 244-246, 2 5 1 , 252, 267, 268, 273, exhibit example, 51
284,286,307,313 See also Team approach
critics of, 289 Teacher center example, 232-233, 304-305
exhibit, 308 Teaching. See Training.
interpreting, 316 Team approach, 106, 122, 129, 131, 186, 277
validity, 252 See also Developmental evaluation; Task force
values about, 274-275, 375 Technical quality, 137, 242-243, 259, 2 6 1 ,
Stories. See Sufi-type stories 269-271, 292, 293, 351-352
Strategic planning, 36, 103 strength, 294
See also Mission level; Planning threats to, 263, 278
Strategic thinking, 117-145, 268, 382 Technology issues, 205-206, 237
Strength of an evaluation, 294 Teenage pregnancies, 6, 157
Subjectivity, 182, 273, 280-283, 299, 348 Test effects, 93-94, 263
Test scores, 161, 179, 256-257, 276-277, 284,
vs. objectivity, 280-283, 299
Sufi-type stories: 373, 377
Textbook conditions, 118, 241
defined, xi, 113
Theory, 2 1 , 24, 38, 45, 7 1 , 72, 76, 123, 208,
deluded about causality, 238
210-211,217-238
disguised blessings, 337
and practice, 33, 221-222, 232-233, 259, 299
donkey smuggling, 87
Angels quote, 215
dragon, 280-281
instrumentalist, 367
elephant's parts, 285
luxury, 237
going bananas, 145
of politics, 341
hare and tortoise, 117-118
paradigm contrasts, 299
hunting bears, 147-148
utility of, 237-238
paradise lost, 1
See also next four main entries
pigs, wolf blowhard, 239-240
Theory-based approaches:
quants and quals, 265-266
menu, 224
those who know teach, xi paradigm contrasts, 299
tightrope walking, 313 Theory-driven evaluation, 72, 194, 218, 237,
search in light, 19 279
useful temple guidance, 212 Theory of action, 194, 215, 221-238
valuable coins, 381 defined, 221
warm and cold hands, 216 levels of evidence, 233-237
why, 384 making comparisons, 232-233
world's smartest person, 195 utilization-focused, 221-222, 234, 236
yogurt making, 385 Theory of the program, 208, 210, 218
Summative evaluation, 22, 65, 67, 68, 70, 76, chain of events, 233-234, 235
78-79, 82, 84, 93, 97, 105-106, 112, 118, deductive, 219, 220, 2 2 1 , 223, 224
119-120, 123, 132, 143, 170, 188, 194, espoused, 221-223, 226, 227, 237
200, 257, 336, 382 inductive, 219, 2 2 1 , 223, 224
exhibit example, 70-71 menu of approaches, 224
implementation questions, 214 three approaches, 219, 223, 224
menu of findings use, 76 theories in use, 221-223, 237
report, 330, 332, 334 user-focused, 219, 221-222
Surprises, avoid, 334 See also Models of programs
Sustainable evaluation, 93 Theses on evaluation, 50
Synthesis evaluation, 72-73, 76, 84, 129, 299 Thomas' theorem, 222
Synthesis of paradigms, 2 9 1 , 297-298, 299 Time, 29, 190, 191, 236, 326
Systematic inquiry, 2 1 , 2 3 , 25, 250, 384 analyzing data, 3 2 1 , 324
Systems analysis, 56, 273, 285 as a constraint, 242, 243, 293, 355
Index U 429
evaluating over, 229, 232 Use:

generating recommendations, 325, 326 challenges, 3-6, 384
last minute reports, 375 credibility and, 250-251
use requires, 383, 384 definition, 79-82
Time lines, 259, 326, 351 evaluate yourself, 385
Timeliness, 2 6 1 , 350 findings use, 63-86
Time management, 6, 189 focusing on, 189-191,382
Timing, 84, 127, 130, 160, 227, 368 goal of evaluation, 299
Total quality management, 68, 69 menus of, 76, 111
Trade-offs, 242, 243, 249, 250, 257-261, 383 overriding concern, 380
breadth vs. depth, 257-258 process use, 87-113
ethical issues, 362 related to methods, 9
internal vs. external validity, 258-259 utility standard, 54-55
truth and utility, 259-261 validity and, 383
Training, 22, 52, 97, 100-101, 103, 106, 123, See also Utilization; Utilization-focused
266,291,292,315,355,383 evaluation
narrowness of, 267, 268-269 User-oriented, 9, 56, 58-60, 221-222, 223, 224
users, 350-351, 353-355, 358, 383 ethics of, 364-365
Transaction model, 56 Utility standards, 17, 18, 247, 249
Transportation example, 246 Utility tests, 250, 259-261
Treatment environments, 210-211 Utility, threats to, 263-264, 383
Treatment specification, 207-211, 222, 284 Utilization:
Treatments, 202-203, 207-211 crisis, 4-10, 17, 291
Trust, 84, 97, 122, 128, 2 5 1 , 369 factors affecting, 9, 43-50, 126-134, 243,
in evaluation, 251-255, 264 259-261
Trustworthiness, 351 study of, 43-47, 60-61, 63-64, 71-72, 74-75,
Truth, 24, 37, 9 3 , 124, 2 2 1 , 259-261, 299, 357, 80-82, 126-127, 260, 334-335, 336
365 planning for, 57
cartoon, vs. action, 24 threats to, 263-264, 380
Lily Tomlin, 265 See also Use and next main entry
multiple, 280-282 Utilization-Focused Evaluation, 20-22, 2 3 , 194,
objectivity and, 280-283 371-385
paradigms compatibility, 295-297 acceptance of, xii
speaking to power, 362 accountability hierarchy, 236
truth tests, 250, 259, 260 Achilles' Heel of, 380-381
Turnover of primary users, 380-381 analysis example, 317-321
20-80 rule, 190 choices, 297-299
20-50-30 rule, 351 comprehensive example, 119-121
Types of evaluation: definition, 23
menu of alternatives, 192-194 deliberations, 317-321
menu of findings uses, 76 design approach, 241
menu of process uses, 111 development of, xii, 20
See also Models of evaluation; Roles; driving force, 250, 382
Utilization-focused evaluation elements of, 17
engendering commitment, 22-23, 353-354
essence, 60
Unanticipated consequences, 2 3 , 206, 280 ethics of, 363-368
Uncertainty reduction, 7 1 , 82, 180, 225, 259, example, 376-377
260 fear of co-optation, 357-359, 362-366
power and, 347, 348-352 feedback, 366
Understandable data, 253-255, 264, 309, 350, focusing, 351, 355, 382
379, 380 fundamental value, 367
Underuse, 7 1 , 82, 180, 225 goals framework, 158-167
Uniqueness, 289-290 flow chart, 378-379
Unit of analysis, 207, 355 levels, 236
Universal evaluation, 339-340 menu-oriented, 64, 329-330, 357
methods, 297-298, 299 threats to, 111-112, 278, 380

negotiating options, 316, 355, 382 utility and, 383
no guarantees, 384-385 Validity assumptions, 225-228, 230-231
outcomes process, 166 Value-free, 21
overriding concern, 380 Values, 2 1 , 2 3 , 65, 66, 88-89, 99, 102-103,
overview, 20-22, 381-385 122-126, 129, 136, 137, 174, 203, 222,
paradigm of choices, 297-299 233, 248, 2 7 1 , 323
paradigms compatibility, 295-297 about methods, 2 7 1 , 274-275
paradigms debate, 268, 270 being explicit, 358
paradigms synthesis, 297-298, 299 being true to, 367-369
power guidelines, 356-357 comparing, 233
premises, 381-383 ethics and, 362
questions, 32, 194, 198, 236 evaluator's, 364
recommendations, 324-329 making judgments, 307
reporting options, 316, 329-334 morality as focus, 367-368
reporting principles, 330-337 politics and, 347
steps in, 4 1 , 60, 166, 186, 302, 355, questions, 368
378-379 shape evaluation, 282
task force, 352-356 Valuing evaluation, 26, 27
theory of action, 221-222, 225, 234, 236 Variables and Wholes, 284-286
training users, 350-351, 353-355, 383 Verstehen, 271
values basis, 367 Vision, 162, 384-385
See also Intended use by intended users; Voice, 93, 139
Involving intended users;
Personal factor
Welfare example, 197
Wilderness program example, 95, 108-109
Validity, 95, 137, 142, 243, 249, 250, 251-253, Win/win, 356-357
258-259, 263, 297, 355 Wholes and variables, 284-286
evidence for claims, 321-324 Workshop evaluation, 94-95
face, 253-255, 264 Worth, 2 3 , 65, 66, 68, 79, 84, 103, 110, 122,
internal and external, 258-259, 278 136, 188, 237, 307, 330, 332, 365, 382
of instruments, 252
interview, 263
overall evaluation, 251-253, 355, 379 Zen and evaluation, 301-302
About the Author
Michael Quinn Patton lives in St. Paul, tion at the University. His doctorate is in
Minnesota, where he founded and directs Organizational Development and Sociol-
an organizational development consulting ogy from the University of Wisconsin.
business: Utilization-Focused Information In his consulting practice, he brings an
and Training. He is also a professor with the evaluation perspective to work in organi-
Union Institute Graduate School—a na- zational development, strategic planning,
tional, nontraditional, and nonresidential policy analysis, futuring, board develop-
university offering doctoral degrees in in- ment, management consulting, and systems
terdisciplinary and applied fields. In addi- analysis. As an interdisciplinary evaluation
tion to Utilization-Focused Evaluation, he generalist, his evaluations have included
has authored four other Sage books: projects in education, health, criminal jus-
Qualitative Evaluation and Research Meth- tice, agriculture, energy conservation, com-
ods (1990); How to Use Qualitative Meth- munity development, corporate planning,
ods in Evaluation (1987); Creative Evalu- human services, poverty programs, leader-
ation (1987); and Practical Evaluation ship development, wilderness experiences,
(1982). He also edited Volume 25 oiNew housing, staff training, mental health, and
Directions for Evaluation on Culture and foundation giving. He has worked on local,
Evaluation. county, state, national, and international
Dr. Patton has served as President of the projects. His heavy schedule of speaking
American Evaluation Association and re- engagements before professional groups
ceived the Alva and Gunner Myrdal Award helps him stay up-to-date on the issues
from the Evaluation Research Society for people are struggling with in attempting to
"outstanding contributions to evaluation conduct useful evaluations.
use and practice." He was on the faculty of Evaluating wilderness education pro-
the University of Minnesota for 18 years, grams in the Southwest introduced him to
including 5 years as Director of the Minne- the wonders of the Grand Canyon, where
sota Center for Social Research. He re- he backpacks at least once a year. The
ceived the University's Morse-Amoco influence of these experiences can be found
Award for outstanding teaching and was in this book by those, he says, "who know
winner of the 1985 Storytelling Competi- the Canyon."
431

Utilization Focused Evaluation The New Century Text 3rd Ed

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Utilization Focused Evaluation The New Century Text 3rd Ed

Uploaded by

Copyright:

Available Formats

LA+ili^a+ion-

Michael Quinn Patton

For information address:

SAGE Publications, Inc.

i> 2455 Teller Road

Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

Patton, Michael Quinn.

Acquiring Editor: C. Deborah Laughton

PART 1. Toward More Useful Evaluations 1

1. Evaluation Use: Both Challenge and Mandate 3

PART 2. Focusing Evaluations: Choices, Options, and Decisions 115

PART 3. Appropriate Methods 239

11. Evaluations Worth Using: Utilization-Focused Methods Decisions 241

About the Author 431

PART 1. Toward More Useful Evaluations 1

1. Evaluation Use: Both Challenge and Mandate 3

2. What Is Utilization-Focused Evaluation? How Do You Get Started? 19

3. Fostering Intended Use by Intended Users: The Personal Factor 39

4. Intended Uses of Findings 63

5. Intended Process Uses: Impacts of Evaluation Thinking and Experiences 87

PART 2. Focusing Evaluations: Choices, Options, and Decisions 115

6. Being Active-Reactive-Adaptive: Evaluator Roles, Situational Responsiveness,

7. Beyond the Goals Clarification Game: Focusing on Outcomes 147

9. Implementation Evaluation: What Happened in the Program? 195

10. The Program's Theory of Action: Conceptualizing Causal Linkages 215

PART 3. Appropriate Methods 239

11. Evaluations Worth Using: Utilization-Focused Methods Decisions 241

12. The Paradigms Debate and a Utilitarian Synthesis 265

13. Deciphering Data and Reporting Results: Analysis, Interpretations,

PART 4. Realities and Practicalities of Utilization-Focused Evaluation 339

14. Power, Politics, and Ethics 341

15. Utilization-Focused Evaluation: Process and Premises 371

About the Author 431

m n the beginning, God created the heaven and the earth.

I he human condition: insidious prejudice, stultifying fear of the unknown, con-

On a cold November morning in Minnesota, some 15 people in various states of

This book is an outgrowth of, and answer to, that question.

Evaluation Use as a Critical Societal Issue

Agency Evaluation Reports Disregarded by Legislators Who Had Requested Them

Gary Dawson, "State Journal" column

SOURCE: Reprinted with permission of Saint Paul Pioneer Press.

Early visions for evaluation, then, fo- N e w Directions

What gets measured gets done.

SOURCE: From Osborne and Gaebler (1992: chapter 5, "Results-Oriented Government").

Such conclusions about programs raise use of program evaluations throughout

SOURCE: Joint Committee 1994.

18 • TOWARD MORE USEFUL EVALUATIONS

ing, in practice, the mandate of the utility Note

W hen I was a child, I spake as a child, I understood as a child. I thought as a

The obvious place to look for use is in needed is a comprehensive framework

Respect for People

Responsibilities for General and Public Welfare

Utilization-focused evaluation does not to illustrate how the philosophy of utiliza-

Rank Order Factor

SOURCE: Smith 1992:53-54).

As I work with intended users to agree Because evaluation use is so dependent

Communicating Professional explore how this "baggage" they've

1986 What Have We Learned?

connections between seemingly uncon- to ask them to construct metaphors and

ticular importance, in this regard, is avoid- smorgasbord banquet styles of teaching/

Minnich's point was nicely illustrated

Instead of beginning by my haranguing you about what you should do in program