Professional Documents
Culture Documents
Alan Patrick
@freecloud
January 2014
broadsight
(Dis)Contents
Original Sins Whose Data is it anyway? Open Data from a hackers point of view
broadsight
(Dis)Claimer
Open Data usage is like any new technology applied to our lives it can be used for good or ill. History shows us that in the early days of any new online technologys life, over optimism about benefits is always rife History also shows us that the Dark Side is nearly always underestimated My aim today is to show that the Dark Side of Open Data is real, serious, and under-estimated - and could cause a major backlash
broadsight
(Dis)Course
broadsight
(Dis)Missed
History is a pack of lies about events that never happened told by people who weren't there
(George Santayana)
broadsight
The Original Sin of the Internet was to assume all the Bad Guys would be on the Outside
the possibility that we may do bad things with computer code was simply not considered. Thus, from the very beginning, the world of computing and the Internet was based on imperfections, flaws and sometimes poorly understood processes (Cybercrime & warfare, Warren & Streeter)
broadsight
There is a worrying assumption that Open Data will only be used by well intentioned people to deliver helpful services
the possibility that we may do bad things with Open Data was simply not considered Thus, from the very beginning, the world of Open Data over the Internet was based on imperfections, flaws and sometimes poorly understood processes (Open Data crime & warfare, Broadsight Review, 2020)
broadsight
broadsight
Wider sharing of medical data has large benefits but it also has large risks and glossing over that loses trust
How to guarantee losing the good will of all your data suppliers:
Go over the heads of the data suppliers - take peoples very private data and try and open it up without asking them first Argue that the ends justify the means without showing any understanding of the asymmetric risks your data suppliers are facing with the means
Dissemble about the commercial arrangements, and constraints to control or penalise malpractice
Finally give in and consult people only when many campaigning groups are mobilising
broadsight
10
If you scan the Social Media, there is now a high degree of scepticism
broadsight
11
broadsight
12
The Eternal Triangulation - data finds data, and then it finds you
How a graduate student de-anonymised anonymised health data from the Massachusetts GIC data in 1997: Governor Weld resided in Cambridge, Massachusetts, a city of 54,000 residents and seven ZIP codes. $20 bought the complete voter rolls of Cambridge, Mass. - a database containing, among other things, the name, address, ZIP code, birth date, and sex of every voter. Only six people in Cambridge shared his birth date, only three of them men, and of them, only he lived in his ZIP code.
In a theatrical flourish, Dr. Sweeney sent the Governors health records (which included diagnoses and prescriptions) to his office.
broadsight
13
In 2000, Dr Sweeney showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. Little has changed.if anything, its worse now .this anonymization process is an illusion. Precisely because there are now so many different public datasets to cross-reference, any set of records with a nontrivial amount of information on someones actions has a good chance of matching identifiable public records. (Pete Warden, OReilly Strata, 2011, quoting Arvind Narayanan, professor of computer science at Princeton. )
broadsight
14
People with bad intentions are going to send you incredibly attractive offers (Jeff Jason, Chief Scientist, IBM Entity Analysis)
broadsight
15
People with bad intentions are going to send you incredibly attractive offers (Jeff Jason, Chief Scientist, IBM Entity Analysis) They are going to triangulate you from various data sources and send you very believable scripts based on that very personal data: Hobbies Location Lifestyle Worries Friends and acquaintances People you trust Hi Mr Patrick. This is the Doctors Surgery. Re your examination last week for Man Flu, we thought you might like to read this: www.innocentwebname.org
broadsight
16
People with bad intentions are going to send you incredibly attractive offers (Jeff Jason, Chief Scientist, IBM Entity Analysis)
GAME OVER!
broadsight
17
broadsight
18
..all you need is people who collect lots of data, and share it as a business model
Google Glasses, 2020 Vision
Charged for fraud last year, not guilty but the community is dubious... Medical records show her husband is infertile. No fertility treatment program recorded
iPhone at hotel Tue. last week iPhone at same hotel last week, same time
Her shopping data shows she has 80% chance of being pregnant
Social Media stream says 89% probability hes gay and BNP
broadsight
19
Far fetched?
The infidelity App map: How iPhone can secretly keep track on love cheats (Daily Mail, 2011 Researchers found that could get stored location data out of iPhones if they knew the phone numbers) How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did (Forbes, 2012, Target stores algorithms identify pregnant girl) Gay? Conservative? High IQ? Your Facebook 'likes' can reveal traits (NBC 2013. University of Cambridge's Psychometrics Centre algorithms.)
BNP member? Beware the Man in the Middle with a Mission! (Guardian, 2009 - BNP membership list appears on Wikileaks)
The other 2 cases are hypothetical, but could come from Government data already in the frame for being opened up
broadsight
20
Even Good Guys cause problems! The road to hell is always paved with good intentions - and bad business cases
We are making a strong case for the release of an Open National Address Dataset. We know that many of you within the data, business and public sector communities support this call, as do many individual citizens (data.gov.uk)
As one commentator on the blog post pointed out: There is no analysis of the disbenefits to the householders of have every fly by night marketing companies having their addresses or it use by fraudsters or identity theft. .dodgy grammar, but spot on analysis. The Original Sin writ large.
broadsight
21
the infographic will be the new stump speech, questioning the data will be the new rebuttal (Alastair Croll, OReilly Data blog)
broadsight
22
the more informed that strong political partisans were (about global warming), the less they agreed with each other (Nate Silver, The Signal and the Noise, quoting a paper from Nature)
3 main politically driven forces: Available data is seldom the whole story, but will be seen like a lamp-post when looking for car keys by pressure groups Some of the data will fuel politically contested issues The data itself will become political (Who collected it? How accurate is it? Whose agenda does it serve?) and debased
broadsight
23
breeds inaccurate dataDirect Line Insurance in the same year found that 11% of respondents claim to have seen but not reported an incident because they feared it would make it more difficult to rent or sell their house.
and politically unacceptable data A service called "Ghetto Tracker" appeared online at the beginning of this week (USA, Sep 2013) and quickly drew criticism for its racist and classist overtones.but the service, renamed, remains and ultimately: Some communities in the US are starting to resist using crime mapping owing to the above dynamics.
broadsight
Copyright Broadsight Ltd
24
broadsight
25
broadsight
26
Next Steps
broadsight
27
Get the data out, we will deal with the problems later
1. The combination of enthusiasts who see no problems, and commercial interests who intend to make money from the causes of the problems, will ensure data will get out without adequate protections 2. The people who experience the problems will have little redress initially, but resistance will increase via social media channels 3. There will be scandals, lessons will be learned, but little will be done 4. until there is one scandal too many, and too many people will have been damaged, and the pressure to Do Something will be unavoidable. 5. Finally there will be (over) regulation, an OfData will be formed, and it will all settle down to business as usual
broadsight
28
1. Accept there is an Original Sin problem design for Bad Guys in the Architecture in the systems, regulations and economics of Open Data. 2. Take strong steps to prevent hacking highly secure reference databases, strong anti-hacking capability, screen data for triangulation issues. 3. Know whose data it is seek permission from data owners for its use, and ensure the taxpayer is not funding private profits, nor on the hook for losses. 4. Toll booths on the roads paved with good intentions - Streamlining legal action on those whose data misuse caused the damage would force planning for hacking and misuse into the service fabric from the get-go 5. Governance of Open Data Oversight by publically accountable bodies, and regulation of commercial data practices before Pandoras Box is opened. There is a case for an OfDat sooner rather than later.
broadsight
Copyright Broadsight Ltd
29
Appropriate Technologies
Tier 3: Data with public interest implications that includes personal information
broadsight
30
1. Be Vigilant The pressure to release private data will be across the board, Tier 3 data is the gold everyone wants. 2. Be Prepared It will be good, responsible citizens who will bear the brunt of the mistakes and misdemeanours as they are easier to hack and have assets. Good people will need to start to generate bad data.
3. Opt Out - where you have a choice, and demand a choice where you cant
4. Agitate Take action against plans that look unwise or downright foolhardy, use Social Media especially to do so. 5. Get involved in pushing for a good outcome - organisations are springing up to lobby for the citizens digital rights in the UK and Globally.
broadsight
31