Jon Slapin.jpg

By Jonathan Slapin, Professor in the Department of Government at the University of Essex, Director of the Essex Summer School in Social Science Data Analysis, and course instructor on Fundamentals of Quantitative Text Analysis for Social Scientists.



The digital age has made huge amounts of data available for analysis in the form of newspapers, blogs, social media feeds, government documents, the list goes on! In this post we consider some of the challenges of working with such vast amounts of data and the role that QTA plays. 

Text analysis has a long history in the social sciences and has been commonly used to analyze media coverage. With the rapid increase in the volume and availability of text data it has become even more relevant. Historically, it involved the human coding of text and this has inherent issues. As the technology to automate the analysis and coding of texts has become more available we are able to go beyond this and treat text as quantifiable data. 


What are the challenges of more data?

Text documents, let’s take parliamentary archives as an example, contain valuable information about what society and organizations think and do. We need tools to analyze the text as we couldn’t viably do this manually. Here are two key reasons:

  1. Volume: There is simply far too much data to read and categorize

  2. Subjectivity: Different people may interpret and code the text differently based on their own biases



Where does QTA come in? 

Quantification of the text allows us to create more objective measures. Of course this is subject to the assumptions made when conducting the analyses. These assumptions can be communicated to others clearly, meaning that someone else looking at the analysis can consider it using the same assumptions and taking the same data and methodology.

A common objection to QTA is that numbers cannot possibly capture the sophistication and nuance contained in language; however, this is not the goal of QTA. With quantitative text analysis: 

  • We can summarize language using data analytics and statistical procedures. This makes our work reproducible by any researcher (as long as assumptions and methods are provided)

  • We can’t make any subjective analysis or discuss subjective meaning beyond what is written in the text and what a computer can read as data.

If you are interested in learning new ways to analyze your texts, you could consider the open source statistical software R. R provides several libraries and functions to efficiently extract useful information from data. 

If you think R could meet your analysis needs but you aren't a confident programmer, take a look Fundamentals of Quantitative Text Analysis for Social Scientists which teaches you how to use R for text analysis. You'll learn the theoretical basis for QTA, survey methods for systematically extracting quantitative information from text, how to identify texts and units of texts for analysis, and how to convert texts into matrices and analyze them.  

If you'd like to read another of Jon's blog posts about QTA, take a look at The 3 Basic Steps of Quantitative Text Analysis