Open Data – the Dark Side

Alan Patrick

@freecloud
January 2014

broadsight

1

(Dis)Contents

• • •

Original Sins Whose Data is it anyway? Open Data from a hacker’s point of view


• •

Spear Phishing, and other things Bad Guys will do
The Politics of Data Some Solutions
2

broadsight

Copyright Broadsight Ltd

(Dis)Claimer

Open Data usage is like any new technology applied to our lives – it can be used for good or ill. History shows us that in the early days of any new online technology’s life, over optimism about benefits is always rife History also shows us that the Dark Side is nearly always underestimated My aim today is to show that the Dark Side of Open Data is real, serious, and under-estimated - and could cause a major backlash

broadsight

Copyright Broadsight Ltd

3

(Dis)Course

“Those who cannot remember the past, are doomed to repeat it”
(George Santayana)

broadsight

Copyright Broadsight Ltd

4

(Dis)Missed

“History is a pack of lies about events that never happened told by people who weren't there”
(George Santayana)

broadsight

Copyright Broadsight Ltd

5

The Original Sin of the Internet

The Original Sin of the Internet was to assume all the Bad Guys would be on the Outside
“…the possibility that we may do bad things with computer code was simply not considered. Thus, from the very beginning, the world of computing and the Internet was based on imperfections, flaws and sometimes poorly understood processes” (Cybercrime & warfare, Warren & Streeter)

broadsight

Copyright Broadsight Ltd

6

The Original Sin of Open Data?

There is a worrying assumption that Open Data will only be used by well intentioned people to deliver helpful services
“…the possibility that we may do bad things with Open Data was simply not considered Thus, from the very beginning, the world of Open Data over the Internet was based on imperfections, flaws and sometimes poorly understood processes” (Open Data crime & warfare, Broadsight Review, 2020)

broadsight

Copyright Broadsight Ltd

7

“imperfections, flaws and sometimes poorly understood processes”
A realistic look at Open Data • Provenance – Much “Open Data” is taken from sources far removed in purpose, context and time to its eventual re-use. Few [data] were created with open public usage in mind. • Practices – In order to use data accurately, one needs to understand the practices that created that data • Propriety – use of the data can destroy public trust as it is removed from the shared social experience it originated in • Processes – Substantial problems for use cannot be avoided….(TBD) (Center for Technology in Government, SUNY, Albany, 2012)
broadsight
Copyright Broadsight Ltd

8

Whose data is it, anyway?

broadsight

Copyright Broadsight Ltd

9

Wider sharing of medical data has large benefits – but it also has large risks – and glossing over that loses trust

How to guarantee losing the good will of all your data suppliers:
• Go over the heads of the data suppliers - take people’s very private data and try and open it up without asking them first • Argue that the ends justify the means without showing any understanding of the asymmetric risks your data suppliers are facing with “the means”

• Dissemble about the commercial arrangements, and constraints to control or penalise malpractice
• Finally give in and consult people only when many campaigning groups are mobilising

broadsight

Copyright Broadsight Ltd

10

If you scan the Social Media, there is now a high degree of scepticism

What do most people think is going to happen?
• • • The benefits will be private, the losses public. Records will not be responsibly used or carefully looked after by private companies – will they really put privacy before profit? People will not be compensated for collateral damage from any data leakage, abuse or errors


A large majority of people say they are going to opt out….
Many still believe that even opted out data will be sold, stolen or “accidentally” lost on a train, or dumped onto the internet

broadsight

Copyright Broadsight Ltd

11

A Hacker’s point of view

History tells us any potential goldmine will be mined….
• • • • • Triangulation of Open Data sources Buy other data for triangulation Open Data has arrived together with Big Computing Which side are all the sharpest knives on? It’s a read/write game.

broadsight

Copyright Broadsight Ltd

12

The Eternal Triangulation - data finds data, and then it finds you

How a graduate student de-anonymised “anonymised” health data from the Massachusetts GIC data in 1997: • Governor Weld resided in Cambridge, Massachusetts, a city of 54,000 residents and seven ZIP codes. • $20 bought the complete voter rolls of Cambridge, Mass. - a database containing, among other things, the name, address, ZIP code, birth date, and sex of every voter. • Only six people in Cambridge shared his birth date, only three of them men, and of them, only he lived in his ZIP code.

In a theatrical flourish, Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.

broadsight

Copyright Broadsight Ltd

13

The Eternal Triangulation – is eternal

In 2000, Dr Sweeney showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. Little has changed….if anything, its worse now ….this anonymization process is an illusion. Precisely because there are now so many different public datasets to cross-reference, any set of records with a nontrivial amount of information on someone’s actions has a good chance of matching identifiable public records. (Pete Warden, O‟Reilly Strata, 2011, quoting Arvind Narayanan, professor of computer science at Princeton. )

broadsight

Copyright Broadsight Ltd

14

Spear Phishing and other things Bad Guys do

“People with bad intentions are going to send you incredibly attractive offers” (Jeff Jason, Chief Scientist, IBM Entity Analysis)

broadsight

Copyright Broadsight Ltd

15

Spear Phishing and other things Bad Guys do

“People with bad intentions are going to send you incredibly attractive offers” (Jeff Jason, Chief Scientist, IBM Entity Analysis) They are going to triangulate you from various data sources and send you very believable scripts based on that very personal data: • • • • • • Hobbies Location Lifestyle Worries Friends and acquaintances People you trust Hi Mr Patrick. This is the Doctor’s Surgery. Re your examination last week for Man Flu, we thought you might like to read this: www.innocentwebname.org

broadsight

Copyright Broadsight Ltd

16

Spear Phishing and other things Bad Guys do

“People with bad intentions are going to send you incredibly attractive offers” (Jeff Jason, Chief Scientist, IBM Entity Analysis)

GAME OVER!

broadsight

Copyright Broadsight Ltd

17

It’s not just Bad Guys….

Cocktail Party, 2020, 20/20 vision

broadsight

Copyright Broadsight Ltd

18

…..all you need is people who collect lots of data, and share it as a business model
Google Glasses, 2020 Vision
Charged for fraud last year, “not guilty” but the community is dubious... Medical records show her husband is infertile. No fertility treatment program recorded

iPhone at hotel Tue. last week iPhone at same hotel last week, same time

Her shopping data shows she has 80% chance of being pregnant

Social Media stream says 89% probability he’s gay …and BNP

broadsight

Copyright Broadsight Ltd

19

Far fetched?

The infidelity App map: How iPhone can secretly keep track on love cheats (Daily Mail, 2011 – Researchers found that could get stored location data out of iPhones if they knew the phone numbers) “How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did” (Forbes, 2012, Target stores algorithms identify pregnant girl) “Gay? Conservative? High IQ? Your Facebook 'likes' can reveal traits” (NBC – 2013. University of Cambridge's Psychometrics Centre algorithms.)

BNP member? Beware the Man in the Middle with a Mission! (Guardian, 2009 - BNP membership list appears on Wikileaks)
The other 2 cases are hypothetical, but could come from Government data already in the frame for being opened up

broadsight

Copyright Broadsight Ltd

20

Even Good Guys cause problems! The road to hell is always paved with good intentions - and bad business cases
“We are making a strong case for the release of an Open National Address Dataset. We know that many of you within the data, business and public sector communities support this call, as do many individual citizens” (data.gov.uk)

As one commentator on the blog post pointed out: “There is no analysis of the disbenefits to the householders of have every fly by night marketing companies having their addresses or it use by fraudsters or identity theft”. ….dodgy grammar, but spot on analysis. The Original Sin writ large.

broadsight

Copyright Broadsight Ltd

21

The politics of Data

“…the infographic will be the new stump speech, questioning the data will be the new rebuttal” (Alastair Croll, O’Reilly Data blog)

broadsight

Copyright Broadsight Ltd

22

The politics of Open Data

“…the more informed that strong political partisans were (about global warming), the less they agreed with each other” (Nate Silver, “The Signal and the Noise”, quoting a paper from Nature)

3 main politically driven forces: • Available data is seldom the whole story, but will be seen like a lamp-post when looking for car keys by pressure groups • Some of the data will fuel politically contested issues • The data itself will become political (Who collected it? How accurate is it? Whose agenda does it serve?) and debased

broadsight

Copyright Broadsight Ltd

23

The politics of Open Crime Data

Crime Mapping - what you measure, gets undone…
• Inaccurate data…In December 2011, Surrey Street in Portsmouth was reported as having 136 crimes, when in fact it had just two.

• …breeds inaccurate data…Direct Line Insurance in the same year found that 11% of respondents claim to have seen but not reported an incident because they feared it would make it more difficult to rent or sell their house.
• …and politically unacceptable data… A service called "Ghetto Tracker" appeared online at the beginning of this week (USA, Sep 2013) and quickly drew criticism for its racist and classist overtones….but the service, renamed, remains • …and ultimately: Some communities in the US are starting to resist using crime mapping owing to the above dynamics.
broadsight
Copyright Broadsight Ltd

24

The realpolitik of Open Data (I)

The influential are gaining more influence.
A recent study of who uses the British mySociety TheyWorkForYou.com open government initiative found that: "people above the age of 54 tend to be over-represented, while those younger than 45 are under-represented in comparison to the Internet population. In terms of demographics there is a strong male bias and a strong overrepresentation of people with a university degree that also translates into strong participation from high income groups….

broadsight

Copyright Broadsight Ltd

25

The realpolitik of Open Data (II)

Open (Government) Data is "what modern deregulation looks like”
“The current „transparency agenda‟ [of the UK government, supported by prominent Open data advocates] should be recognised as an initiative that also aims to enable the marketisation of public services, and this is something that is not readily apparent to the general observer. Further, whilst democratic ends are claimed in the desire to enable „the public‟ to hold „the state‟ to account via these measures, there is an issue in utilising a dichotomy between the state and a notion of „the public‟ which does not differentiate between citizens and commercial interests…” (Jo Bates, This is what modern deregulation looks like, 2012, Manchester Metropolitan University)

broadsight

Copyright Broadsight Ltd

26

Next Steps

broadsight

Copyright Broadsight Ltd

27

The Downside Case

“Get the data out, we will deal with the problems later”
1. The combination of enthusiasts who see no problems, and commercial interests who intend to make money from the causes of the problems, will ensure data will get out without adequate protections 2. The people who experience “the problems” will have little redress initially, but resistance will increase via social media channels 3. There will be scandals, “lessons will be learned”, but little will be done… 4. …until there is one scandal too many, and too many people will have been damaged, and the pressure to Do Something will be unavoidable. 5. Finally there will be (over) regulation, an OfData will be formed, and it will all settle down to business as usual

broadsight

Copyright Broadsight Ltd

28

Working for an Upside Case

1. Accept there is an Original Sin problem – design for “Bad Guys in the Architecture” in the systems, regulations and economics of Open Data. 2. Take strong steps to prevent hacking – highly secure reference databases, strong anti-hacking capability, screen data for triangulation issues. 3. Know whose data it is – seek permission from data owners for its use, and ensure the taxpayer is not funding private profits, nor on the hook for losses. 4. Toll booths on the roads paved with good intentions - Streamlining legal action on those whose data misuse caused the damage would force planning for hacking and misuse into the service fabric from the get-go 5. Governance of Open Data – Oversight by publically accountable bodies, and regulation of commercial data practices before Pandora’s Box is opened. There is a case for an OfDat sooner rather than later.
broadsight
Copyright Broadsight Ltd

29

Appropriate Technologies

Different rules for different Tiers

Tier 1: Data with no public interest implications

Tier 2: Data with public interest implications

Tier 3: Data with public interest implications that includes personal information

(Final report on Open Data Dialogue - Research Councils UK)

broadsight

Copyright Broadsight Ltd

30

What can you do as individuals?

1. Be Vigilant – The pressure to release private data will be across the board, Tier 3 data is the gold everyone wants. 2. Be Prepared – It will be good, responsible citizens who will bear the brunt of the mistakes and misdemeanours as they are easier to hack and have assets. Good people will need to start to generate bad data.

3. Opt Out - where you have a choice, and demand a choice where you can’t
4. Agitate – Take action against plans that look unwise or downright foolhardy, use Social Media especially to do so. 5. Get involved in pushing for a good outcome - organisations are springing up to lobby for the citizen’s digital rights in the UK and Globally.

broadsight

Copyright Broadsight Ltd

31