You’ve probably heard of R, the statistical software package, but are you aware of all its benefits? I’m going to briefly outline the main advantages of R, with a focus on how it can help you clean up and sort all that messy data that threatens to disrupt your research project if not dealt with properly (as well as give you a major headache!).

R is so hugely popular because:

  • It is free and open-source – a major advantage over other stats packages like SPSS for example
  • There is a very active and engaged community of users all over a world - meaning help and troubleshooting information is never more than click away
  • It is general enough to allow you to work with lots of different types of data

But with different types of data, come different types of challenges. Multiple datasets, messy data in long and wide formats, spreadsheets with missing values, and text data with spelling mistakes are all realities researchers have to deal with. Happily, R can also help with that!

In our course, Practical Data Management with R, the instructor Matt Denny takes you step by step through how to flex R to clean and sort your data, ensuring your conclusions are fully robust. As Matt says,

“The new and exciting sources of data on social phenomenon are increasingly coming from electronic sources, in formats such as text, networks, and spatial data. And you need the programming firepower provided by R in order to collect and work with these new data sources.”

Through a combination of screencasts, knowledge checks, assignments, and social science tailored content, Matt will cover the following:

Unit One - Introduction to R and RStudio

As well as introducing you to the course, this unit will teach you:

  • How to install R and RStudio
  • Basic R programming skills, such as how to write commands in an R script
  • How to understand the core data structures you need to manage a huge variety of data

Unit Two - R Programming Fundamentals

This unit will teach you about:

  • Data I/O and packages so you can extend the functionality of R
  • Looping and conditional statements so you can automate wildly complex tasks
  • Functions so you don’t have to write the same code over and over again for similar tasks

Unit Three - Data Management in R

This unit will teach you how to:

  • Manage multiple datasets by example
  • Convert long and wide format data
  • Deal with poorly formatted data and/or missing data
  • Automate tasks using functions
  • Work with and manipulate text data

Unit Four - Automated Data Collection

This unit will give you:

  • An overview of web/text scraping and the related legal considerations
  • A basic web scraping example so you can learn how to treat a webpage as a messy text document
  • An understanding of scraping Twitter

Unit Five - Performance and Scalability

This unit will focus on:

  • Giving you an overview of big data and high-performance computing (HPC)
  • Teaching you about performant programming
  • Wrapping up with a range of next steps and ways to extend your skills

Find out when the next cohort starts on the Practical Data Management with R course page!