You’ve probably heard of R, the statistical software package, but are you aware of all its benefits? I’m going to briefly outline the main advantages of R, with a focus on how it can help you clean up and sort all that messy data that threatens to disrupt your research project if not dealt with properly (as well as give you a major headache!).
R is so hugely popular because:
It is free and open-source – a major advantage over other stats packages like SPSS for example
There is a very active and engaged community of users all over a world - meaning help and troubleshooting information is never more than click away
It is general enough to allow you to work with lots of different types of data
But with different types of data, come different types of challenges. Multiple datasets, messy data in long and wide formats, spreadsheets with missing values, and text data with spelling mistakes are all realities researchers have to deal with. Happily, R can also help with that!
In our course, Practical Data Management with R, the instructor Matt Denny takes you step by step through how to flex R to clean and sort your data, ensuring your conclusions are fully robust. As Matt says,
“The new and exciting sources of data on social phenomenon are increasingly coming from electronic sources, in formats such as text, networks, and spatial data. And you need the programming firepower provided by R in order to collect and work with these new data sources.”
Through a combination of screencasts, knowledge checks, assignments, and social science tailored content, Matt will cover the following:
Unit One - Introduction to R and RStudio
As well as introducing you to the course, this unit will teach you:
How to install R and RStudio
Basic R programming skills, such as how to write commands in an R script
How to understand the core data structures you need to manage a huge variety of data
Unit Two - R Programming Fundamentals
This unit will teach you about:
Data I/O and packages so you can extend the functionality of R
Looping and conditional statements so you can automate wildly complex tasks
Functions so you don’t have to write the same code over and over again for similar tasks
Unit Three - Data Management in R
This unit will teach you how to:
Manage multiple datasets by example
Convert long and wide format data
Deal with poorly formatted data and/or missing data
Automate tasks using functions
Work with and manipulate text data
Unit Four - Automated Data Collection
This unit will give you:
An overview of web/text scraping and the related legal considerations
A basic web scraping example so you can learn how to treat a webpage as a messy text document
An understanding of scraping Twitter
Unit Five - Performance and Scalability
This unit will focus on:
Giving you an overview of big data and high-performance computing (HPC)
Teaching you about performant programming
Wrapping up with a range of next steps and ways to extend your skills
FIND OUT HOW TO GET SAGE CAMPUS FOR YOUR INSTITUTION.