Professional Documents
Culture Documents
339
Big Data is a concept or term that has emerged in concert with the view that
defines the postmillennium era as the knowledge age. A working definition
adopted for this paper and one that has informally evolved in practical as well as
scholarly circles is: Big Data may be viewed as databases that are too large to
be adequately handled by current spread sheet technologies. As such Big Data
sets are viewed by some to have limited use for public policy as well as other
analytical purposes because of their irregular and heterogeneity properties
(Schintler, 2013). As a consequence they tend to be inherently biased and lead to
a conclusion that contemporary statistical analysis routines are inadequate to
examine them. Furthermore, they cannot be adequately visualized, and it is often
difficult to understand what the analyst is working with. In short, there is a signal
to noise problem: Big Data sets are inherently noisy and thus their signal quality
is poor.
While the usefulness of Big Data for informing policy processes may have
shortcomings, having access to more information concerning those who are
affected by policy options than ever imagined in the past is a positive prospect that
is of enormous potential. Assuming the problem of handling such large and heterogeneous resources that Big Data offers are manageable in the near future, the
other large problem concerns privacy. Already privacy advocates and stakeholders
are voicing concern. Fourth Amendment lobbying groups include an interesting
fusion of normally somewhat opposing forces with the American Civil Liberties
Union on the left and organizations like the Eagle Forum on the right now joining
in a common cause. Other groups or stakeholders such as the academic community are generally concerned as well.
Although the Fourth Amendment restricts the U.S. federal government intrusion into privacy, it has no direct regulatory effect on actions at the state or local
Review of Policy Research, Volume 31, Number 4 (2014) 10.1111/ropr.12083
2014 by The Policy Studies Organization. All rights reserved.
340
level, nor does it restrict intrusion by private citizens. As a result, the U.S. government is placed in a situation of trying to obey the law and stay out of our privacies
(erring on the side of caution) while it is at the same time trying to formulate and/or
execute policy that would most benefit from deep knowledge concerning populations that would be served by such policy.1 The U.S. government thus faces a huge
dilemma of appropriate and legal action while many other individuals and organizations of the U.S. society are highly exploitive of otherwise private information. We
now illustrate this problem with an example.
The U.S. military is very carefully avoiding the use of social media so as to avoid
violating the use of personally identifiable information (PII) of U.S. persons and not
just U.S. citizens as defined under Regulation S (promulgated under the Securities
Act of 1933) in Section 902(k)(1). Also see: 5 U.S.C. 552a(a)(2) that defines a U.S.
person as Any natural person resident in the United States, any partnership,
corporation, and so on. The U.S. military Northern Command does not allow itself
or its subordinates to collect or analyze social media data even in the case of a
Katrina-like disaster, despite the fact that its job in that case would be to support civil
authorities (local, state, and federal) in coping with the effects of such a natural
disaster. As a result, efforts are underway to develop an approach that encourages
the Defense Advanced Research Projects Agency (DARPA) to build and demonstrate
technology that would strip PII from social media. It is envisioned then that this
new PII Stripping technology would be available to the rest of the U.S. government so that it could reasonably and innocently use any Web 2.0 conveyed data by
eliminating or at least minimizing the potential abuse of private information.
DARPA has a strong track record in the IT arena by changing public policy with the
invention of and its contribution to the development of the Internet. So it is not
unreasonable to now, once again, recognize its potential for changing public policy
making by providing technology that would immunize U.S. persons against
intrusions into their privacy. Such a new technology could fundamentally transform
government, because private citizens (i.e., all U.S. persons) could reasonably relax
concerning providing personal information to the government. For example, PII
Stripping could offset one of the huge impediments to implementing the Affordable Health Care Act and the fear many people have concerning submitting their
private medical information to the U.S. government portal.
The NORTHCOM case is only an illustrative example. Other departments
including The Department of Homeland Security (established in 2007), the Office of
Management and Budget, and the National Institute of Standards and Technology
weigh heavily on the PII problem (McBride, personal communication). The recently
revealed National Security Agency (NSA) leaks relate to exactly these issues of
government intrusion into privacy as they are gigantically amplifying the problems
with Big Data and Fourth Amendment sanctity (Gellman, Blake, & Miller, 2013).
Privacy issues include the potential intrusions into our medical as well as our financial
privacies (Consumer Financial Protection Bureau, 2010) inter alia. The technical,
engineering distinctions between data and information are important here,
particularly in the Big Data realm. Data, per se, are sets of numbers or numerals,
often arranged in some logical order, perhaps in a spread sheet or other storage
system: The importance is that data, by themselves, do not inform, as they are merely
numbers. Information on the other hand is technically understood to mean data that,
341
342
manage privacy invasion problems could be to strip all Big Data records or files of
personal identifiable information before making them available for analysis.
Research aimed at identifying a technology for stripping personally identifiable
information is underway in several U.S. government agencies and universities.
However, it is unclear how far in the future a viable application is. Such a solution
is needed to enable use of Big Data to facilitate improved public policy.
Note
1 One of course might argue that this generalization does not apply to security agencies in the U.S.
federal government such as the NSA, Central Intelligence Agency, and others.
References
Consumer Financial Protection Bureau. (2010). Privacy policy for non-U.S. persons. Retrieved from http://
www.consumerfinance.gov/privacy-office/privacy-policy-for-non-us-persons/
Department of Homeland Security. (2007). Privacy Policyguide. Retrieved from http://www.dhs.gov/xlibrary/
assets/privacy/privac_policyguide_2007-1.pdf
Gellman, B., Blake, A., & Miller, G. (2013, June 9). Edward Snowden comes forward as source of NSA leaks. The
Washington Post.
Schintler, L. (2013). The potential risk of big data for public policy, a presentation at George Mason University, November
6, 2013. Retrieved from http://policy.gmu.edu/the-potential-and-risk-of-big-data-for-public-policy/