By Katie Metzler, Head of Methods Innovation at SAGE Publishing

Big data headlines are appearing daily across our newspapers and magazines. A few weeks ago, Facebook, Google and Twitter were in front of a congressional committee in Washington to answer questions about Russia’s attempts to influence last year’s US presidential election by spreading misinformation online. In the UK, an article in the Observer on Brexit tells of “a shadowy global operation involving big data and billionaire friends of Trump” who used micro-targeting of political advertising to suppress voter segments and influence the outcome of the referendum vote. A few weeks ago, a Fortune headline asked “is big data killing democracy?”, and the November 4th cover of the Economist showed a smoking gun in the shape of the Facebook “f”.

And as if that weren’t scary enough, it seems it isn’t just democracy that is being threatened. In Cathy O’Neil’s recent book, Weapons of Math Destruction, she gives examples of how big data and predictive, proprietary algorithms are being used to maximise profits and reduce costs for businesses with damaging effects for whole swathes of the population, but especially for already disadvantaged groups.

Against this backdrop of media stories warning us about big data and its dangers, SAGE Publishing hosted a panel debate as part of the ESRC Festival of Social Science panel entitled “Putting big data to good use” at the British Academy in London. The aim of the event was to discuss whether the news is all bad when it came to big data and its potential uses and how we can effectively utilise the power of big data in the social sciences.

I chaired the event and was joined by four panelists:

Dr. Maria Fasli, Professor of Computer Science and Director of the Institute for Analytics and Data Science at the University of Essex
Dr. Slava Mikhaylov, Professor of Public Policy and Data Science at the University of Essex, holding a joint appointment in the Department of Government and the Institute for Analytics and Data Science
Dr. Jonathan Gray, Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King's College London and co-founder of the Public Data Lab
Ian Mulvany, Head of Product Innovation at SAGE Publishing

Big Data for Not So Good Use

To kick us off, I set the scene by sharing examples from Cathy’s book of how big data is being used in ways that most of us would find objectionable, such as the predatory targeted advertising carried out by for-profit universities in the US has left thousands of vulnerable students with mountains of debt. Another example that’s made the news and is featured in Cathy’s book is the use of big data and reoffending risk algorithms in the US penal system which have been written in a way that “guarantees black defendants will be inaccurately identified as future criminals more often than their white counterparts.”

Common Criticisms

From these various news stories, and from Cathy’s book, some common criticisms of the way big data is being collected and used emerge.

Firstly, there is the issue of consent. Though we have all ticked a box to agree to Facebook’s Terms and Conditions, did any of us read them all the way through? Or know that Cambridge Analytica would come along and use our Facebook data to build a model that helped the Trump campaign micro-target political ads?

There is also an issue with the way in which algorithms are perceived as scientific and objective, despite the fact that human biases can be, and often are, baked in to the design. Cathy O’Neil and many others have talked about the danger of “black box algorithms”, and the danger of models which do not update or self-correct when new information becomes available.

And who is regulating these algorithms, to ensure they aren’t racist or sexist, for example? In many cases, it’s nearly impossible for an individual to fight back if they are unfairly scored by a companies algorithm. Cathy’s book tells of teachers fired due to algorithmic scoring, credit denied, jobs not offered… the consequences can be widespread and destructive especially for those who are already disadvantaged.

On The Other Hand…

But, and this is a very important but, big data and mathematical models aren’t inherently bad. Big data has the potential to do wonderful things. Big data is being used to to support election monitoring in the Global South, to tackle epidemics and cure diseases, to improve the targeting of humanitarian aid to those who need it most. Organisations like DataKind UK match charities and data scientists together and host hackathon-like events called “DataDives” to use data science to find solutions to charities’ problems (see here for more information about a DataDive SAGE sponsored last year).

Big data is neither good nor bad inherently, it depends on the way it’s used, by whom, and in service of what outcomes. For most of us, the issue is around who we trust with our data and what outcomes the use of our data brings about.

At the ESRC Festival of the Social Sciences event, our panel of experts took us through some examples of how big data is being used for social good, and some of the challenges facing academics who are striving to put big data to better use in ways that reduce inequality and improve outcomes for society.

UNESCO Data Science and Analytics

Professor Fasli explained the terms big data, data science and analytics and spoke of her work as the UNESCO Chair of Data Science and Analytics.

“One of the key objectives of the UNESCO Chair in Analytics and Data Science team is to highlight the critical role that data plays in promoting equality, sustainable development and how it can enhance people’s lives. The more people have access to data, the more you increase transparency within a country… Working with our international collaborators, we will support the development of a research base and skills specifically focusing on developing and transitioning countries. In addressing this skills gap, through targeted scholarship and training programmes, we will be improving people’s data literacy, meaning they will be upskilled to have the tools to positively contribute to and participate in public life, increasing their ability to make informed decisions, and to hold organisations and institutions to account.”

UNESCO Chair in Analytics and Data Science (University of Essex)

Natural Language Processing, Climate Change and Public Health

Professor Mikhaylov spoke about his recent work on climate change and public health, which has been published in the Lancet.

“Climate change is already affecting the health and wellbeing of millions globally. There has been a noticeable increase in efforts to tackle this challenge, but more can be done, particularly, around accelerating the policy response for health implications of climate change.”

Professor Mikhaylov, a natural language processing expert, was involved in the public and political engagement part of the research where they focused on the United Nations.

“We collected speeches by heads of state and government or their representatives in the UN General Assembly and by applying natural language processing we assessed the level of engagement with the intersection of climate change and public health in their statements,” he explained. “We found different levels of engagement across countries depending on their exposure to climate change (eg Western Pacific nations) and levels of political contention over the issue (eg North America).

“We also identified that the levels of engagement increased in the run-up to major climate change summits but sharply decreased immediately thereafter, highlighting the short attention span that is still prevalent among high-level policymakers.”

Professor Mikhaylov’s research suggests that by linking climate change and public health together, and timing important messages around climate change summits, there is an opportunity to increase engagement among policymakers (for at least a short amount of time!).

Fake News and the Public Data Lab

Dr Jonathan Gray told us about the work of the Public Data Lab at the University of Bath, and their recent publication of A Field Guide to Fake News.

A Field Guide to Fake News explores the use of digital methods to trace the production, circulation and reception of fake news online.

The Public Data Lab, founded in partnership with SAGE Publishing, aims to facilitate social science research, teaching and public engagement activities around the future of the data society. It mobilises an interdisciplinary network of researchers, practitioners and organisations in order to develop and disseminate innovative research, teaching, design and participation formats for the creation and use of public data. It aspires to support deliberation and knowledge exchange around the creation and use of digital public data in the service of social research, policymaking, advocacy, journalism and public engagement around current and future global challenges – from climate change to tax base erosion, migration to automation. Dr Jonathan Gray explained:

“Through the field guide we’d like to contribute towards richer public debate and democratic deliberation about what fake news is and how to address it. In particular we’d like to facilitate exploration of not just the content and claims of fake news items, but a better understanding of how they circulate and what they mean to different publics.”

Big data, big barriers

Finally, Ian Mulvany from SAGE talked about some of the barriers academics face when attempting to engage in big data research. In 2016, SAGE conducted a survey of more than 9,000 social scientists to learn more about researchers who are engaged in research using big data and the challenges they face, as well as the barriers to entry for those looking to do this kind of research in the future.

32 percent of respondents who are currently engaged in big data research reported that getting access to commercial or proprietary data was a “big problem” for them:

Challenges facing big data researchers (n = 2273) Who Is Doing Computational Social Science? — Challenges facing big data researchers (n = 2273)
**Who Is Doing Computational Social Science?**

There is also a skills gap holding social science back: the level of quantitative and programming skills required for big data research make it a challenge for educators to introduce it into traditional social science degree courses as there is little time or expertise amongst teaching faculty. SAGE has responded to this by launching a new series of online courses teaching data science skills to social scientists: SAGE Campus.

We’re all optimists here

To conclude the event, we asked each panelist whether they were optimistic about the future of data science and big data for social good. And, unfortunately for those looking for a heated debate, all of the panelists were hopeful about the vast opportunities for academic research afforded by big data. Clearly there is work to be done to maximise benefits and minimise harms, and to ensure that social science researchers are equipped with the skills and tools they need to engage effectively, but we’re all optimists here.

Watch the full panel discussion below:

Full discussion: Putting big data to “good” use

Want to keep up to date with what is going on in the world of big data and social science research? Sign up to our monthly newsletter!