You are on page 1of 4

bs_bs_banner

339

Big Data and U.S. Public Policy


Roger Stough
School of Public Policy, George Mason University
Dennis McBride
Office of Research and Economic Development, George Mason University
Abstract
This paper examines the growing recognition of the phenomenon called Big Data and the policy
implications it poses. It is argued that a core policy issue is personal and organizational privacy. At the
same time there is a belief that analysis of Big Data offers potentially to provide public sector policy
makers with extensive new information that would inform policy at unprecedentedly detailed levels.
Despite this potential to improve the policy-making process data often contain individual identifiable
information that would negatively impact American core values such as privacy. This makes the use of
these data almost impossible. The paper recognizes that there may be a way to strip individual data from
Big Data sets thereby making their analysis more policy useful. This approach is not at this time technically
feasible but research is ongoing.
KEY WORDS: comparative governance, e-governance, governance, national governance, regional
governance, Big Data, policy, personal privacy

Big Data is a concept or term that has emerged in concert with the view that
defines the postmillennium era as the knowledge age. A working definition
adopted for this paper and one that has informally evolved in practical as well as
scholarly circles is: Big Data may be viewed as databases that are too large to
be adequately handled by current spread sheet technologies. As such Big Data
sets are viewed by some to have limited use for public policy as well as other
analytical purposes because of their irregular and heterogeneity properties
(Schintler, 2013). As a consequence they tend to be inherently biased and lead to
a conclusion that contemporary statistical analysis routines are inadequate to
examine them. Furthermore, they cannot be adequately visualized, and it is often
difficult to understand what the analyst is working with. In short, there is a signal
to noise problem: Big Data sets are inherently noisy and thus their signal quality
is poor.
While the usefulness of Big Data for informing policy processes may have
shortcomings, having access to more information concerning those who are
affected by policy options than ever imagined in the past is a positive prospect that
is of enormous potential. Assuming the problem of handling such large and heterogeneous resources that Big Data offers are manageable in the near future, the
other large problem concerns privacy. Already privacy advocates and stakeholders
are voicing concern. Fourth Amendment lobbying groups include an interesting
fusion of normally somewhat opposing forces with the American Civil Liberties
Union on the left and organizations like the Eagle Forum on the right now joining
in a common cause. Other groups or stakeholders such as the academic community are generally concerned as well.
Although the Fourth Amendment restricts the U.S. federal government intrusion into privacy, it has no direct regulatory effect on actions at the state or local
Review of Policy Research, Volume 31, Number 4 (2014) 10.1111/ropr.12083
2014 by The Policy Studies Organization. All rights reserved.

340

Roger Stough and Dennis McBride

level, nor does it restrict intrusion by private citizens. As a result, the U.S. government is placed in a situation of trying to obey the law and stay out of our privacies
(erring on the side of caution) while it is at the same time trying to formulate and/or
execute policy that would most benefit from deep knowledge concerning populations that would be served by such policy.1 The U.S. government thus faces a huge
dilemma of appropriate and legal action while many other individuals and organizations of the U.S. society are highly exploitive of otherwise private information. We
now illustrate this problem with an example.
The U.S. military is very carefully avoiding the use of social media so as to avoid
violating the use of personally identifiable information (PII) of U.S. persons and not
just U.S. citizens as defined under Regulation S (promulgated under the Securities
Act of 1933) in Section 902(k)(1). Also see: 5 U.S.C. 552a(a)(2) that defines a U.S.
person as Any natural person resident in the United States, any partnership,
corporation, and so on. The U.S. military Northern Command does not allow itself
or its subordinates to collect or analyze social media data even in the case of a
Katrina-like disaster, despite the fact that its job in that case would be to support civil
authorities (local, state, and federal) in coping with the effects of such a natural
disaster. As a result, efforts are underway to develop an approach that encourages
the Defense Advanced Research Projects Agency (DARPA) to build and demonstrate
technology that would strip PII from social media. It is envisioned then that this
new PII Stripping technology would be available to the rest of the U.S. government so that it could reasonably and innocently use any Web 2.0 conveyed data by
eliminating or at least minimizing the potential abuse of private information.
DARPA has a strong track record in the IT arena by changing public policy with the
invention of and its contribution to the development of the Internet. So it is not
unreasonable to now, once again, recognize its potential for changing public policy
making by providing technology that would immunize U.S. persons against
intrusions into their privacy. Such a new technology could fundamentally transform
government, because private citizens (i.e., all U.S. persons) could reasonably relax
concerning providing personal information to the government. For example, PII
Stripping could offset one of the huge impediments to implementing the Affordable Health Care Act and the fear many people have concerning submitting their
private medical information to the U.S. government portal.
The NORTHCOM case is only an illustrative example. Other departments
including The Department of Homeland Security (established in 2007), the Office of
Management and Budget, and the National Institute of Standards and Technology
weigh heavily on the PII problem (McBride, personal communication). The recently
revealed National Security Agency (NSA) leaks relate to exactly these issues of
government intrusion into privacy as they are gigantically amplifying the problems
with Big Data and Fourth Amendment sanctity (Gellman, Blake, & Miller, 2013).
Privacy issues include the potential intrusions into our medical as well as our financial
privacies (Consumer Financial Protection Bureau, 2010) inter alia. The technical,
engineering distinctions between data and information are important here,
particularly in the Big Data realm. Data, per se, are sets of numbers or numerals,
often arranged in some logical order, perhaps in a spread sheet or other storage
system: The importance is that data, by themselves, do not inform, as they are merely
numbers. Information on the other hand is technically understood to mean data that,

Big Data and U.S. Public Policy

341

when properly interrogated, reduce a specific uncertainty. Thus, one, or two


mean nothing in particular unless contextualization is applied. With context,
however, one if by land, two if by sea changes one and two from mere data to
information. This is important in cases of Big Data storage of personal and largely
private information. By definition, if the data are merely data, they are not informative, and thus in essence could be viewed as nonintrusive. This is true, however, if and
only if the data are not transformed into information. There are many techniques
that can be used in order to prevent the conversion of mere data into potentially
harmful information, not the least of which is encryption. Beyond the scope of this
paper, the distinctions among data, meta-data, and information must be addressed
directly and technically in the context of public policy, lest politics might cloud the
issues (McBride, personal communication).
Thus, Big Data also implies Big Meta-Data, which further compounds privacy
issues and problems. For example, if one knows a persons ZIP code, exact day of
birth, and sex, then who that person is can be known with a 0.9++ probability.
Having knowledge of one of these individual pieces of information is not an
invasion of privacy, but together with the others it is. With these three information
elements, one can find out for example what kind of cancer a person has been
diagnosed with. The hobby of doing this is called doxing and is the short term
for document sourcing.
One of the greatest benefits of Big Data is that it can provide more information
concerning populations served by public policy. At the same time, paradoxically, the
biggest impediment to Big Data or information concerning people is that its use
potentially intrudes into or supports the perception that it might intrude into and
compromise our constitutionally guaranteed protection of privacy from government exploitation. Big Data also might mean that government employees can
rummage around in huge databases with impunity. This is not at all trivial. There
are cases where U.S. government officials abruptly and somewhat ceremoniously
have removed themselves physically from rooms where personally identifiable
information was likely to be confronted.
There are several messages that this short paper has aimed to convey. First, the
term Big Data is a new and fuzzy term that appears to hold different meanings for
different groups and stakeholders. We try to avoid confusion in this paper by
defining it in a rather simple but operational way as data and information that is too
big to be analyzed and/or managed with spreadsheet technologies. Second, we note
that the size, diversity, heterogeneity, and discontinuous nature of Big Data that
most often characterizes it create limitations in our ability to analyze it. Third, we
have highlighted the importance of technical, definitional distinctions between data
and information, and thus between Big Data, big meta-data, and so on. The terms
should be crisply defined as they are in engineering terminology so that policy
decisions can be formulated without the otherwise overwhelming influence of
political hijacking. That said, as our ability or methodologies evolve to work
around these problems occur, then Big Data will offer huge potential benefits in the
form of much improved policy making and impact. Yet with the elimination of
analytical processing of Big Data there will still be another major problem in
reaping the potential benefits. That is the very real possibility that use of such data
by the government would easily encroach on individual privacy. A possible way to

342

Roger Stough and Dennis McBride

manage privacy invasion problems could be to strip all Big Data records or files of
personal identifiable information before making them available for analysis.
Research aimed at identifying a technology for stripping personally identifiable
information is underway in several U.S. government agencies and universities.
However, it is unclear how far in the future a viable application is. Such a solution
is needed to enable use of Big Data to facilitate improved public policy.
Note
1 One of course might argue that this generalization does not apply to security agencies in the U.S.
federal government such as the NSA, Central Intelligence Agency, and others.

About the Authors


Roger Stough is University Professor and Associate Dean for Research, School of Public
Policy, George Mason University.
Dennis McBride is Associate Vice President for Research, Office of Research and Economic
Development, George Mason University.

References
Consumer Financial Protection Bureau. (2010). Privacy policy for non-U.S. persons. Retrieved from http://
www.consumerfinance.gov/privacy-office/privacy-policy-for-non-us-persons/
Department of Homeland Security. (2007). Privacy Policyguide. Retrieved from http://www.dhs.gov/xlibrary/
assets/privacy/privac_policyguide_2007-1.pdf
Gellman, B., Blake, A., & Miller, G. (2013, June 9). Edward Snowden comes forward as source of NSA leaks. The
Washington Post.
Schintler, L. (2013). The potential risk of big data for public policy, a presentation at George Mason University, November
6, 2013. Retrieved from http://policy.gmu.edu/the-potential-and-risk-of-big-data-for-public-policy/

You might also like