The changing tide: R you ready?
Statistics, data analysis and data visualisation are a crucial aspect of social science degrees - across disciplines from psychology to political science and across levels, from undergraduate to postgraduate. For higher education institutions, the most popular statistical software package for teaching this aspect of the social sciences has traditionally been IBM’s Statistical Package for the Social Sciences (SPSS).
However, in recent years, digitalization and the explosion of big data has led to demand for data science skills and big data software. We’ve seen new statistical software packages on the block, while others have improved or grown in popularity. Consequently, higher education institutions are realising that SPSS isn’t the only fish in the sea.
In 2016, Robert A. Meunchen’s undertook an analysis of data science software use and found that while SPSS is the leading software, it’s on the decline - and R is on the rise. R is a programming language and free software environment for statistical computing and graphics.
The rise of R is widely documented, ranking 15th in RedMonk’s Programming Language Ranking in June 2019. While R might not rank as high as JavaScript or Java, it is making more impact in academia and the social sciences. A 2010 article from Stack Overflow noted that “R is most visited from universities, where it’s a common choice for academic research, especially in the social sciences and biology”.
So, why exactly are higher education institutions switching to R when teaching social science?
5 reasons institutions are making the switch to R
1. R is free - saving institutions from expensive software licences and students from unnecessary costs
First things first; R is free - and RStudio (a recommended interface for working in R) also has a free, open-source edition. SPSS and other statistical software packages have expensive licenses - and institutions with large social science departments or a number of computer labs may need an institution-wide licence. Switching to R saves on these costs.
Not only does R save institutions from annual software licence fees, but it also saves social science students and graduates from buying licences for their personal computers. On IBM’s website, pricing starts at $99 per user per month, though students and faculty can get a ‘GradPack’ or ‘Faculty Pack’ for discounted prices from partner providers. A switch to R saves students these costs - which can be crucial for students trying to budget!
2. R gives students career-ready data science skills
As mentioned earlier in the blog, the explosion of big data has created demand for data science skills. In job descriptions, “programming skills required” is now included where “proficiency in Excel and Word” once stood. And this demand is not only in industry but in academia too, as computational social science research becomes increasingly important. A survey conducted by SAGE Campus found that 74% of social researchers reported they want or need to learn data science skills, with R being the 3rd most sought after skill.
By switching to teaching social science in R, institutions give students a competitive advantage in the career place by equipping them with in-demand programming skills - making them more employable.
In fact, Ceyhun Ozgur, Michelle Kleckner and Yang Li discuss in their 2015 article how the software taught in colleges and universities could address the shortage of data analysts proficient with big data and help meet the demand.
3. R makes reproducible research the norm
Reproducibility and transparency is of utmost importance in research. This is particularly pertinent in social science research, where various studies have failed to replicate the results of published experiments and there’s hot debate on a ‘reproducibility crisis’.
R makes reproducible and transparent research the norm. Analyses conducted in R is done by code and R makes you ‘script’ (save files of) your code. That means everything conducted in R, from importing and exploring data, to running the analysis, to creating the visualizations and preparing reports is recorded. This transparency on what was done to the data enables research to be rerun and repeated - by the researcher themselves and by others.
4. R lets you get visual
As data literacy becomes more important, so does data visualization. One key benefits of switching to R is R’s excellent data visualization capabilities. Students and researchers are able to create data visualizations with a few lines of code.
R currently has over 14,000 packages (collections of R functions, data, and code in a defined format) in its CRAN repository that can be downloaded. That means there are pre-created packages to download for a huge variety of data visualization tasks - meaning you don’t have to reinvent the wheel. For example the lattice package can be used when visualizing multi-variate data or the Leaflet package can be used to build interactive maps.
Rather than tell you, we’ll show you. Check out some R visualizations in this gallery.
5. R is super compatible and can be used for all types of data (including big)
Another reason institutions are switching to R is it’s compatibility. R can also be run on several operating systems including Windows, Mac and Linux - meaning students won’t struggle to download it on their personal machines and researchers can easily collaborate on a project together.
R can also import data in various files types; be it SPSS, Excel, Stata or more. That means institutions still can use existing data they are sitting on and researchers can combine data from various sources. Lastly, R can be used for big data research. As a default, R runs on data that fits in your computer’s memory - which can make people think it can’t handle big data. However, that’s not true. There are tonnes of ways R can be used for big data research. This article from R Views outlines a few.
Overcoming challenges: helping faculty and students learn R
Switching to R can seem overwhelming for higher education institutions. A common concern is whether their social science students, who aren’t from a technical background, will be able to learn R.
Don’t panic about programming: R is considered one of the simpler programming languages to learn and one of the best to start out for social scientists with no prior programming experience. For example, a blog from DataHowler covers how one set of psychology undergraduates preferred R to SPSS.
Another challenge can be that while some forward-thinking faculty members already have the skills and desire to incorporate R into their teaching, others may be resistant or need upskilling themselves.
Upskilling faculty is a crucial aspect of keeping up with the latest computational methods. There are plenty of resources out that Heads of Department and Deans can utilize to upskill faculty. A key to overcoming resistance is knowing that big data and computational methods are evolving social science - and they’re here to stay.
Want to find out more about how get SAGE Campus for your institution?
Get more information about institution-wide access for SAGE Campus’ Introduction to R for Social Scientists online course by visiting this page.
Teaching in R? Check out these R textbooks that could support your class
Statistics with R by Robert Stinerock
Quantitative Social Science Data with R by Brian J. Fogarty
Discovering Statistics Using R by Andy Field, Jeremy Miles and Zoë Field
An Introduction to R for Spatial Analysis and Mapping (2nd edition) by Chris Brunsdon and Lex Comber
Join SAGE’s community of institutions switching from SPSS to R
If you’re considering switching from SPSS to R in your department or your own teaching, get in touch.