Practical Data Management with R for Social Scientists 

Next course runs from 9 October - 5 November 2017



Practical Data Management with R for Social Scientists 

Next course runs from 9 October - 5 November 2017

Course description

Data management - the art and science of collecting, cleaning, and manipulating data - is one of the most essential research skills for social scientists.

This course will seek to demystify these 'dark arts', and provide you with the foundation to tackle a wide range of practical data management tasks using the R programming language.

By the end of this course, you will possess the programming skills and experience to manage data in a wide variety of formats, in a reproducible manner, at scale.

Learning outcomes

This course will provide you with the foundation to tackle a wide range of practical data management tasks using the R programming language. It will teach you:

  • Basic R commands and data structures for manipulating data
  • The ability to read data from multiple formats in and out of R
  • Proficiency using loops, conditional statements, and functions to automate common data management tasks
  • Familiarity with R’s package system for extending its functionality
  • The skills to clean and manage multiple complex datasets
  • The ability to clean and manipulate textual data
  • An understanding of basic web scraping techniques, for both standard web pages and the Twitter API
  • An overview of the techniques and hardware necessary to manage large datasets efficiently


Any questions? - Contact us

For a bulk order of 5 or more learners on any of our courses, you can claim 50% discount. Contact us for more information.

Practical Data Management with R for Social Scientists

12-20 hours, divided into 4 units. The course is self-paced, but we recommend the units to be taken over a 4 week period
Basic computer skills, and some experience working with any sort of statistical analysis software (Excel, SAS, Stata, SPSS) will be helpful.
Matthew Denny
Start Date:
Start Date:

Course Instructors

Course Instructors

Course Instructor


How it works

How it works

How It Works

This course is broken up into four units, starting with basic R programming and working up through more advanced data management, web scraping, and finally big data/HPC techniques and issues. Each unit is broken up into a number of discrete topics, with video lectures and supporting materials associated with each topic. At the end of each unit, participants will be given a homework assignment that synthesizes what they have learned in that unit.
Each unit should take about a week, with the earlier units requiring a bit more time. Each topic builds on the previous topics, so it is designed to be completed in order, unless you have previous experience with a topic. In the lectures, Matt provides as much context and explanation for each new concept as possible. If you put in the work, and try out the code yourself, you should have a strong foundation to start working on serious data management tasks in R by the end of the course.




Unit One

Programming and Data Structures in R

As well as introducing you to the course, this unit will teach you:

  • How to install R and RStudio
  • Basic R programming skills, such as how to write commands in an R script
  • How to understand the core data structures you need to manage a huge variety of data
  • Data I/O and packages so you can extend the functionality of R
  • Looping and conditional statements so you can automate wildly complex tasks
  • Functions so you don’t have to write the same code over and over again for similar tasks

Unit Two

Data Management in R

This unit will teach you how to:

  • Manage multiple datasets by example
  • Convert long and wide format data
  • Deal with poorly formated data and/or missing data
  • Automate tasks using functions
  • Work with and manipulate text data

Unit Three


Automated Data Collection

This unit will give you:

  • An overview of web/text scraping and the related legal considerations
  • A basic web scraping example so you can learn how to treat a webpage as a messy text document
  • An understanding of scraping Twitter

Performance and Scalability

This unit will focus on:

  • Giving you an overview of big data and high-performance computing (HPC)
  • Teaching you about performant programming
  • Wrapping up with a range of next steps and ways to extend your skills

Unit Four




Frequently Asked Questions

Please see below for answers to some of the most frequent questions we get about this course.

Can't find what you're looking for?

How long will I have access to the course for?

The course will be run over 4 weeks, during which you will have access to learning support provided by the course instructor. After the 4 weeks, you will still have access to the course materials for another 2 months, but you will not be able to receive learning support from the instructor, and if there is a course forum, you will not be able to ask any questions.

Do learners get a certificate?

All of our courses offer a certificate of completion signed by your instructor. You will be able to download this certificate, from the Learning Platform, when you complete the course.