You are on page 1of 3

[A Report on the Future of Statistics]: Comment

Author(s): David Madigan and Werner Stuetzle


Source: Statistical Science, Vol. 19, No. 3 (Aug., 2004), pp. 408-409
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/4144388 .
Accessed: 15/01/2011 17:09
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=ims. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to
Statistical Science.

http://www.jstor.org

408

B. G. LINDSAY,J. KETTENRINGAND D. O. SIEGMUND

acknowledge the fact that these ideas are circulating


everywhereand being tackled by all the mathematical
sciences-mathematics, operationsresearch,computer
science-not to mention subject matterareas such as

physics, which have a traditionof stochastic models


as old if not older than ours. Everyone is interestedin
learningfrom data.Let us not try so hardto distinguish
ourselvesfrom otherfields, butjust do!

Comment
David Madiganand WernerStuetzle
GRADUATE
STATISTICS
EDUCATION
The inexorable rise of computing and large-scale
data storage has impacted most academic disciplines,
sometimes in profound ways. Biology, for example,
has become an informationscience where the tools of
data analysis are as commonplace as the microscope.
In astronomy,the study and analysis of vast stellar
databasestakes center stage. In the business arena, financialmarketsgenerateriversof intensely scrutinized
data, and all majorglobal-scale retailersstore and analyze vast quantitiesof customerand transactiondata.
The trendis universaland unstoppable.
Extraordinaryopportunitiesfor statisticalideas and
for statisticiansnow present themselves. However, to
take advantage of the opportunities,statistics has to
change the way in which it recruitsand trainsstudents.
Statisticshas primarilyfocused on squeezing the maximum amountof informationout of limited data. This
paradigmis rapidlydiminishingin importanceand statistics educationfindsitself out of step with reality.The
problems begin at the high school and undergraduate
levels, where the standardcourse includes a narrowset
of pre-computing-eratopics. At the graduatelevel, the
typical statistics programsuffers from the same problem. Most programsfocus primarilyon problems of
estimation and testing, where mathematicsbrilliantly
finesses a paucityof computingpower.The demandfor
graduatesof such programsis real and possibly growing. However, students emerging from our programs
are ill-preparedto engage in cutting-edgeresearchand

collaborationin the burgeoninginformation-richarenas.


Statistics as a discipline exists to develop tools for
analyzingdata.As such, statisticsis an engineeringdiscipline and methodologyis its core. We shouldprepare
graduate students for methodological research, and
note that in methodologicalresearchcomputerscience
plays a role thatis comparableto the role of mathematics. Hence we should try to attractstudentswho arenot
only mathematicallyadept, but who also have a backgroundandinterestin computingand an inclinationtoward collaborativeresearch. Unfortunately,however,
the current situation is that the computing skills of
our incoming and outgoing graduatestudents are often woeful, and their experience with meaningfulcollaborativeresearchis nonexistent."Computingskills"
here does not refer to the ability to write a SAS or
C program.Ratherit refers to the ability to design and
evaluate algorithms for computationallychallenging
statisticalmethodology.
We examinedthe statisticsPh.D. programsat 12 major U.S. universities. Almost all of these programs
includedcore courses in statisticalestimationand testing, generalizedlinear models, probabilitytheory and
applied statistics. Most research statisticians would
probably recognize these as the courses they took in
graduateschool. However, we contend that a statistician trainedonly in this mannerlacks the skills needed
to tacklethe kinds of challengesthatnow presentthemselves. In particular,most standardprogramseschew
exposure to and practicalexperience with the following topics:

David Madigan is Professor,Departmentof Statistics,


Rutgers University,Piscataway, NJ 08854-8019, USA
(e-mail: madigan@stat.rutgers.edu).WernerStuetzle
is Professor,Departmentof Statistics and Department
of ComputerScience and Engineering, Universityof
Washington,Seattle, WA 98195-4322, USA (e-mail:
wxs@stat.washington.edu).

* Predictive modeling beyond the classical linear


model.
* High-dimension,low sample size statisticalanalysis.
* Analysis of data that are not in spreadsheetform,
such as text data,relationaldatabasesand streaming
data.
* Bayesiandata analysis.

409

THE FUTURE OF STATISTICS

* Hierarchicaland multilevel modeling.


* Causalanalysis.
* Design and analysis of algorithms and data structures.
The notion of a common set of core courses for all
graduatestudents is no longer tenable. A strong programwill include courses or course sequencesin some
or all of the topics we list above, in addition to the
usual sequencesin probability,theoreticalstatisticsand
applied statistics. Studentswould choose among these
sequences accordingto theirresearchinterestsand talents.
The issues we raise above have nothing to do with
the old distinction between applied statistics and theoretical statistics. The traditional viewpoint equates
statisticaltheorywith mathematicsand thence with in-

tellectual depth and rigor, but this misrepresentsthe


notion of theory. We agree with the viewpoint that
David Cox expressed at the 2002 NSF Workshopon
the Future of Statistics that "theoryis primarilyconceptual,"ratherthanmathematical.For example,recent
outstandingtexts such as The Elements of Statistical
Learning (Hastie, Tibshiraniand Friedman,2001) or
Learning with Kernels (Schilkopf and Smola, 2002)
are not mathematicstexts per se, yet they presentprimarily theoreticalcontent.
REFERENCES
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001). The Ele-

ments of StatisticalLearning.Springer,New York.


B. and SMOLA,A. J. (2002). Learning with KerSCHOLKOPF,
nels: Support VectorMachines, Regularization,Optimization
and Beyond.MIT Press, Cambridge,MA.

Comment
MarianthiMarkatouand Bruce Levin
We would like to congratulatethe editors of this
reportfor coherently and succinctly summarizingthe
challenges and opportunitiesfor the field of statistics
as we enter the twenty-firstcentury.The reportis the
culminationof discussions that took place during the
workshop held in May 2002 at the National Science
Foundation(NSF). The purposeof this workshopwas
to assess the currentstatus of the field of statistics, to
identify the challenges and opportunitiesthat statistics
faces and to develop a strategyfor how to position the
field to meet its currentandfuturedemands.This report
also clarifiesthe often misunderstoodrole of the statistical sciences and illustratesits position in, and impact
on, the scientific enterprise.
Three main themes are addressed in the report:
(1) a wealth of interestingand difficultresearchproblems generated by the interaction of statistics with
other subject-matterareas, (2) education and (3) resourcerequirementsto meet researchneeds andeducaMarianthi Markatou is Associate Professor,
Department of Biostatistics, Columbia University, New York, NY 10032, USA (e-mail:
mm]68@columbia.edu). Bruce Levin is Professor
and Chair, Department of Biostatistics, Columbia
University, New York, NY 10032, USA (e-mail:
bruce.levin@biostat.columbia.edu).

tional demands.We will briefly comment on each one


of these aspects.
The reportmakes it clear thatthis is an exciting time
for statisticalresearch.Many importantand interesting
areas of interdisciplinarywork are describedin the report. Under the heading of informationtechnology we
would here like to add biomedicalinformatics,with its
subareasof clinical informatics and public health informatics. Clinical informaticsis the science of effectively extractingand using informationin patientcare,
clinical researchand medical education with the ultimate goal of improving quality of care and reducing
costs. Public healthinformaticsis the applicationof informationandtechnologyto public healthresearchand
practice(Friedeet al., 1995).
What connects biomedicalinformaticswith the field
of statisticsis, among otherthings, a set of challenges
posed by the analysis of datacollected both on healthy
people and on patients.The problemof contamination
(robustness)is very importantin this context. If 5% of
a large data set amountsto anotherlarge data set, the
behaviorof these differentpoints needs to be explained
and addressed.In other words, the outliers may be of
medical significanceandtheirbehaviorneeds to be understood.Modeling, especially the creationof predic-

You might also like