// Replace title block colour with text shadow

This post is a guest blog by Dr Phillip Brooker, instructor on our Introduction to Python and Intermediate Python Skills online courses. Phillip is a lecturer in Sociology at the University of Liverpool, with interdisciplinary research interests in and around ethnomethodology and conversation analysis, science and technology studies, computer-supported cooperative work, and human-computer interaction.


Digital data

As I'm typing this into a word processor, every stroke of the key is generating and storing digital data. When I've finished, I'll use those same keys to email it over to my contact at SAGE, creating more, different, digital data. And from there, these data will be transformed into yet more digital data; words on the pages of a website. Various social media platforms - SAGE's, mine, the departmental account at Liverpool, maybe others too - will then signpost it, generating more digital data in more different formats. And so it goes. Totally mundane stuff like this happens all day every day, and in its wake are left traces of human interaction. It's those traces that have become particularly captivating for researchers seeking to understand how social life works in regards to the technology and (online) environments that form at least a few vertebrae of its backbone.

The proliferation of digital data (and the internet generally) has given social science a vastly expanded topical remit to investigate, and also a vastly expanded palette of methodological tools with which to investigate them. The widespread adoption of Twitter as almost a default site of interest in social media analytics is a case in point - there is already a wealth of research on any number of topics taking social interaction on Twitter as a primary focus. And there are now also well-established methodological practices for using existing tools (e.g. Chorus, Mozdeh, NodeXL, Pulsar) to support such research, which is deeply attentive to how Twitter makes social life visible.

Programming-as-social-science (PaSS)

However, folding digital data into the practice of doing social science on the basis of pre-existing tools can potentially come at the cost of methodological innovation and reflexivity - how are we to figure out new questions to ask of these data if our practices are shaped by tools which are designed and developed on the basis of requirements derived prior to our research? Thus far, collecting and visualising data has been the chief mode of engagement between social scientists and digital data, yet this activity is rather limited in scope considering the functionally infinite range of applications that programming can generate (e.g. chatbots, artificial intelligence, games, physical devices, etc). Hence, to tap into these possibilities, it's worth considering how we can get under the hood of these tools and processes.

clement-h-544786-unsplash.jpg

It's on this basis that I've been advocating the uptake of computer programming skills as a core research skill for social science students and practitioners - I tentatively call this Programming-as-Social-Science (PaSS)  - through publishing research in the area and developing teaching materials and courses to support learners in building these skills. But what is computer programming, what is PaSS, what kind of skills are these? And how might we move to incorporate them into the forms of social science training we already carry out?

Programming is, in essence, the means by which a human can instruct a computer to carry out operations using a programming language (e.g. Python) to formulate those instructions.

As such, it may be tempting to think that learning to program is merely a matter of knowing the commands to type and the technicalities of the grammar and syntax to glue commands together. There is some (small) truth in this, but as I suspect anyone who has tried to learn programming from materials organised around generalised and generic purposes might agree, once you get beyond the very basics, it is hard to then put them to use unless you have domain-specific things to apply the basics to. In this sense, while it might be tempting to think that form can be separated from content (and certainly there are plenty of awful "Learn to Program in a Day" books that sell themselves on this!), practically, learning to program as a social scientist involves building in social scientific thinking from the very start. For us at least - the social scientists who might start from a point of seeking to critique and correct applications of content-ignorant programming that cause harm - the why of programming is as important as the how.

form and content - the ‘tay’ affair

Take for example Microsoft's Twitter chatbot, "Tay" - released on the 23rd March 2016, Tay was intended to demonstrate new innovations in artificial intelligence, by learning how to converse with tweeters in ways which would, as the bot learned from more and more interactions with humans, be increasingly indistinguishable from human-to-human interactions. In terms of form, the chatbot worked - it did, in fact, respond to tweeters' conversational prompts in grammatically legible ways, and (presumably, though the code has never been made public; for good reason as we shall see) it did, in fact, refine the accuracy with which it did so with every new interaction.

However, in terms of content considerations (in the sense of thinking through how code fits into the social world and the effects it may have there), Tay was severely lacking. Its designers at Microsoft had failed to consider that malicious tweeters would collectively troll the bot by force-feeding it interactions containing extremely hateful content. Tay had no means of filtering out the inputs of large numbers of users instigating alt-right-influenced conversations on the topics of genocide and misogyny, to the extent that Tay eventually "learned" to amplify these extreme views and repeat them itself - amongst the most problematic statements propagated by Tay were "Hitler did nothing wrong!" and "I fucking hate feminists and they should all die and burn in hell". Any (digital) sociologist with a knowledge of what happens on Twitter would have been able to anticipate this; the problem was that sociological thinking was not incorporated into Tay's design. As such, Tay ended up being form without content.

teaching pass with python

unsplash-image-D9Zow2REm8U.jpg

Hence, I argue, a smart move from here is to situate PaSS amongst the core research practices of social scientists in a general way. PaSS is not so much a distinct method of enquiry that sits alongside others (e.g. surveying, interviewing, etc), but something more foundational - acquiring the skills to read and write computer software, to think with a "programming mindset", might be considered something more fundamental to sociological encounters with a world increasingly characterised by digital interactions and the production, proliferation and use/misuse of digital data. But the issue remains that it is not easy to insert something so potentially transformative of our practices of doing social science into existing frameworks - for instance, how best to teach PaSS to students and practitioners?

As noted, the basics of how to use a programming language like Python have to be taken on trust to some degree. When we're learning to write we have to concentrate our efforts on controlling the pen with our hands to form the shapes of the letters of the alphabet; only once we've gone far enough with this we can then begin to use the form of writing to create meaningful content (i.e. words, sentences, stories).

With Python, we might start by learning how to formulate variables, lists, dictionaries, functions etc in a similar way - as a basic palette of skills that we can progressively see how to shape into meaningful critical applications.

Cautionary tales such as the Tay affair (and there are countless others too) can be built in alongside this more discursively, to guide our hand further and keep an eye on the grander applications that we might apply our thinking to - both those that we could use knowledge of programming to read and analyse (if we could see Tay's code, could we identify just where the problems live in it?) and those that we might design and build ourselves as sociologically-informed interventions into the world.

Python Book.PNG

If this approach tempts you to pick up PaSS yourself, there is a burgeoning collection of tools and events that can help you do so. SAGE Campus offers online courses in Python for social science, and my book, Programming with Python for Social Scientists, aims to provide a beginner-to-intermediate trajectory along these lines too. PaSS is something that I hope we'll see gathering momentum continually, to the point where programming features as part of the backdrop of general research skills in the teaching and practice of social science departments universally. And I for one am excited to see where today's and tomorrow's social scientists go with it.


Faculty can assign our online courses to their students and researchers to equip them with the knowledge they need. Find out about our Introduction to Python and Intermediate Python skills courses and sign up to our demo hub to try a free module.

Libraries can get a full 30-day institution-wide trial to SAGE Campus. Recommend us to your library or request a trial if you are an administrator via this form.