Timandra Harkness, Radio 4 regular and author of a recent book on big data, joined Nottingham Skeptics in the Pub at The Canalhouse to talk about Big Data – GAV SQUIRES went along to find out – Does Size Matter?
30,000 years ago, in an ice-age cave, the shin bones of a young wolf were used as a tally stick. This is the oldest piece of “digital” data (digital referring to something that you could count on your digits if you had enough fingers) We don’t know how it was used or why. The bone has 57 marks in it, which translates to 11101 in binary, which is just under a byte of data. These days most computer hard drives are in the region of a terabyte, equivalent to a million million wolf shin bones.
But what is big data? One definition states that it’s slightly too much for the computer that you have to handle! Just one of the experiments at the CERN supercollider (the Compact Muon Solenoid) generated 40million megabytes per second. It has now been upgraded and produces a gigabyte of data every second. This is just one of the experiments. How about if we imagine DATA as an acronym…
D – Different datasets
You can combine different types of data. A brain scan can be just another source of data but it can be combined with things such as medical records, post codes and weather reports from those areas to examine the impact of hours of sunlight on the progression of MS. Coming different types of data and then asking questions that the gatherers didn’t think of is what make it big data.
A – Automatic
It’s easier to collect it than not collect it. For example, Strava automatically tracks where you go and how fast. You can then use this to make pictures or you can just make use of the results.
T – Time
You can find patterns and predict. You can also take action to change the future.
A – Artificial Intelligence
If you were shown a series of pictures and asked to identify which were cats and which were not cats it would be easy. We have learnt this through trial and error over the course of our lives. Writing down a list of rules to identify “a cat” from “not a cat” would be tricky. Machine learning is similar to how we learnt but then the computer makes its own rules.
Obviously one of the big concerns about big data is privacy. If you put together all of this data, no matter how well you anonymise it, you could still trace it back to an individual. For example, on average 20 people live in an individual post code in the UK so it wouldn’t need much extra data to narrow it down to an individual. Having said that, just because it feels like a database tells you everything doesn’t mean that it does – big data is not necessarily all of the data.
There are also concerns over profiling – the surveying of populations and using the results to make judgements about us. In the US they are using the Compass questionnaire to help judges work out the risk of re-offending and so what sentences should be. The questionnaire contains questions such as, “a hungry person has the right to steal” The results are derived from nothing more than statistical correlation. While it is supposed to be unbiased, African Americans are still more likely to be incorrectly assigned a high-risk score, even though there are no questions about race in the questionnaire. It just decides that “people like you are more likely to re-offend” Similar algorithms are also being used to decide who to give job interviews to. There is an issue here with abdicating responsibility.
What is a person? What can be measured? If an archaeologist digs up someone’s bones then it is possible to infer things about that person but it doesn’t tell you about that person’s subjective experiences of the world. Similarly, data can tell you about a person but not what it is like to be that person – there are things that data can’t capture.
The old model of consent doesn’t work for big data because it is impossible to say what the data might possibly be used for. There needs to be a model where someone can say that their data can be used in principle for this but not for that. An alternative model could use technology to give people ongoing control over how their data is used. In the end, you have to trust that people who have your data will act in your best interest. For example, data from wearable devices could be helpful when it comes to personal care but it could also be used against you if you haven’t been very active for example.
There are those that say that if you have nothing to hide then you have nothing to fear but everybody has something to hide, whether it is from their friends, their family or their work colleagues. In China, they have introduced Sesame points where they automatically collect data to track who is being a good citizen. It also looks at who your friends are. Similarly, in the UK, Wonga ask people applying for loans to sign up to their Facebook app. From there, they can see who your friends are and check their credit rating as that’s a good indicator as to whether you’ll be able to pay.
Where are the lines between all of the people that have your data? They all buy and sell it between each other and the police and the security services can get hold of it if they need to. Big data can see patterns in populations but these do not always filter down to an individual. Sometimes, big data can be a poor guide – it should be a tool but people need to use their own judgement. Where can big data do more? There are certainly a lot of applications in the world of science and technology. For example, CERN or in treating infections. There are definitely better uses that just putting smart meters in people’s houses so that they turn their heating down.
Nottingham Skeptics in the Pub returns to The Canalhouse on the 9th of May at 7:30pm where Victoria Stiles will talk on “What are the lessons of history (and will we ever actually learn them)?” For more information, visit the SitP website: http://nottingham.skepticsinthepub.org/
Words & photo by Gav Squires