Last month, Dr Taha Yasseri of the Oxford Internet Institute and author of our Research Design in Social Data Science online course, hosted a webinar on natural experiments in social data science. In this blog, we share a free recording on the webinar and Taha’s answers to the questions we didn’t have time to cover in the webinar.

About the webinar

Dr Taha Yasseri introduces the emerging field of social data science and explains the role of natural experiments as a tool in the big data-driven approach to research. He showcases examples of successful natural experiment designs in his own research and beyond.

About Dr Taha Yasseri

Taha is the course instructor on SAGE Campus' online course: Research Design in Social Data Science. He is a Senior Research Fellow in Computational Social Science at the Oxford Internet Institute, University of Oxford, an Alan Turing Fellow at the Alan Turing Institute for Data Science, and a Research Fellow in Humanities and Social Sciences at Wolfson College, University of Oxford.

Research Design Webinar

Name

First Name

Last Name

Job title

Institution

Email Address

Your contact preferences

Please opt in for emails so that we can keep you up-to-date with SAGE Campus and other news from SAGE. You can unsubscribe at any time. Please refer to our Privacy Policy (https://campus.sagepub.com/privacy-policy/ ) or Contact Us (https://campus.sagepub.com/contact-us) for more details.. If you already receive emails from SAGE Publishing, this will not affect your existing preferences.

Opt In

Opt Out

Type of content

Q&A

What is the best technical skill to acquire for social data science (i.e. coding skills to collect the data)?
In terms of programming skills, any language you could learn is helpful. But basic Python and R would be the most relevant. Network analysis software such as Gephi can be useful too. But to turn into a computational social scientist, the most important skill is to use other people's code and software (of course with their permission!).
How do you get permission to access the large amounts of data in social data science studies?
It depends on the case. Some of the projects are based on publicly available data. For example Wikipedia editorial activity and viewership data are publicly available without any barriers. Some others, we crawelled the website ourselves using a piece of computer programme, for example the data from the petitioning website. To be able to do that, it would be useful to learn about APIs.
In some cases, we had to ask a private organization to share the data with us, and sometimes, they agree! Generally, speaking, we never had any privilege that other students or researchers do not have. If you ask, you’ll get!
Do you have any more references on the time series design and analytic methods used in time series design?
This is a very good book: Box-Steffensmeier, J. M., Freeman, J. R., Hitt, M. P., & Pevehouse, J. C. (2014). Time series analysis for the social sciences. Cambridge University Press. But if you want to see an interesting research example, see: Aral, S., & Nicolaides, C. (2017). Exercise contagion in a global social network. Nature communications, 8(1), 1-8.
In the petition example you provided, can you please tell us which petitions received more support? Were those the ones that were trending?
A list of the most successful petitions in the UK is available here.
We only found evidence for the petition on the top left will receive more signatures (it’s the first place most people look at). Please read more here:
Hale, S. A., John, P., Margetts, H., & Yasseri, T. (2018). How digital design shapes political participation: A natural experiment with social information. PloS one, 13(4).
In the last study you covered in the webinar, the time shifted cross correlation. If A happens before B (with high correlation between the two), does that necessarily mean the two have causal relation? What are the good ways to rule out the confounding variables between the two variables?
Not necessarily in the lack of any other evidence. In this case we also had some data on the web analytics would strongly suggest that there is a connection between social media and petitioning website traffic. The cross-correlation analysis allows us to determine the direction of the causality when we are sure there is a causal relationship.
To answer the second question, ideally, a controlled experiment! In the absence of a controlled experience, there are some statistical techniques, see this article which also provides some code to be able to control for some types of underlying confounding variables:
Lerman, K. (2018). Computational social scientist beware: Simpson’s paradox in behavioral data. Journal of Computational Social Science, 1(1), 49-58.

Webinar recording and Q&A: Natural experiments - An empowering tool for social data science

Q&A

Find out how to get SAGE Campus for your institution here.