WELCOME TO YOUR BEGINNER'S GUIDE TO R
What you need to know about this guide
All of the information you're going to find here has been extracted from the online course Introduction to R for Social Scientists. We've carefully picked resources that we think you might find useful, but these are by no means exhaustive. Each topic covered gives you a taster of what you can find in the first module of the online course. We hope this guide is a great starting point for learning about R!
On Introduction to R for Social Scientists you will gain the skills you need to use this flexible and multi-purpose platform for your own research. You will learn how to perform a wide range of data management tasks, with a focus on solving day-to-day conundrums that we all face as social scientists. You will also be able to use R to perform some of the most common statistical techniques used in the social sciences, namely a dimension reduction technique and OLS regression with interactions.
You can either make use of the guide here in your browser, or you can you download it and keep it for regular use.
Please note that the discount code noted in this guide has expired.
Let's get started!
Hi! I'm Andreea
I am course instructor on the online course Introduction to R, and a research fellow at Exeter University.
Before I started learning R, I was convinced that my lack of computer science knowledge would make it almost impossible to get to grips with. I started out studying Sociology and then Survey Methodology at university. Neither of these degrees teaches you about programming, although I did benefit from solid training in social statistics. Not long after I started my PhD in Sociology, I decided to challenge myself and discovered a whole new world of statistics and amazing graphics that I hadn't been exposed to before.
As it turned out, programming skills are not needed for the most common types of data exploration, analysis, and visualization that social scientists like myself are interested in performing in R. The fantastic flexibility of R, which opens the door to a plethora of analytical and graphical possibilities, and the sense of community amongst R users, have won me over and I have never gone back. Once an R user, always an R user!
WHAT IS R?
Let’s start by talking about what R is and the myriad reasons to make this your go-to statistics software.
R is possibly the most comprehensive platform for data management, exploration, analysis and visualization. You can do just about any type of analysis you can think of in R!
Great news: You don’t need to spend thousands on software anymore. R is a free, open-source program that is available for Windows, Mac OS X, and Linux. You can use a portable version of R if you don’t have administrator rights to install software on the computer you intend to use. The user-friendly graphical interface of RStudio makes R look and feel more familiar if you have already worked in another statistical software such as MatLab.
On top of that, there are numerous user-contributed packages that you can download to extend the functionality of R. The documentation for every type of code proposed in these packages is very easily accessible. You can even contact the package author if you feel something is unclear. These additional packages contain useful code that allows you to perform specific tasks faster and in fewer steps. They also give you access to state-of-the-art statistical techniques and visualizations that may not yet be available in any other software.
Speaking of graphics...
R is the single most powerful software for displaying data. The interactivity afforded by R is also unparalleled.
Most analyses that we conduct in the social sciences are complex and require multiple steps. With R, you can store results from one analysis and use them as input in another.
Moreover, as a social scientist, you also probably encounter data in a multitude of formats, which can pose a serious challenge. R has the impressive capability to both import and export data from and to a variety of sources, including but not limited to:
THE R LANGUAGE SIMPLIFIED
Fear not the terminology...
To be able to take advantage of the exciting capabilities of R, first we need to speak its language. This video explains some key R terminology:
Let's dig a little deeper into some R terminology...
Objects in R programming are data structures. Think about it this way: we need to somehow access the data in our computer’s memory… Objects are R’s way to allow you to do that. We can load an object into R or we can create one ourselves.
Every time we refer to an object, we have to use two symbols (combined) in order to achieve that: '<' and '-', so we would type the following into the Console:
object name <-
followed by some code
Function is probably one of the most important terms in your R vocabulary. We will use functions over and over again, and I tend to think of functions as the gateway to doing amazing things in R. This is how you tell R what to do. If you have already used other statistical software, functions are similar to syntax in SPSS or commands in STATA.
In other words, functions are simply code.
Functions come after objects and are followed by an argument. An argument to a function is the code (usually in brackets) where you specify required parameters, such as the data you want to work with. Arguments to a function are also where you set changes to the default options and include any additional parameters you might want to use.
Vectors, factors and lists are all different ways that R stores data. You will carry out several types of data manipulation and analysis throughout this course, so it's important that you understand how R stores our data.
Vectors are types of variable you can create yourself in R, or that you may find in data sets you load.
The three types of vectors we are interested in are:
numeric vectors, which are just like continuous variables
character vectors, which are like categorical variables that have text-form values
logical vectors, which are similar to binary variables. Instead of 0 and 1, they use TRUE and FALSE
i.e. for whether someone has a characteristic or not.
Lists are super important! A lot of new R specific terms you will learn are essentially types of lists, such as R’s famous data frames. Data frames are tables with rows and columns, which is how data sets are stored in R. Understanding these basic data structures is an important step to understanding how R stores big data sets.
A list can contain anything, including other lists. They can be extremely useful for stapling together different types of results from multiple analyses in a single object. You can then return different elements of the object as you need.
Useful resources and FURTHER reading
The Origins of R: Find out about the origins and ethos of R here
GitHub is a platform for developers that hosts code, coding projects/reproducible examples and is also used as a repository for packages. It's free to sign up to GitHub. You can store your own projects there and create multiple versions of the same project that you may or may not want to share with the rest of the community. The great thing about GitHub is that you can find great code (in all coding languages) written by other users which can be adapted to solve your own statistical puzzles. Signing up is simple and definitely recommended! We can’t dedicate too much time to the subject (there is a lot to learn), but please take the time to explore the wonderful world of GitHub if you can.
The R Journal: Another great resource I would like to share with you is the R Journal. It is the official open-source reviewed journal of the R programming language. You can find great examples of R applications, as well as the documentation on R packages and more.
Definitions: Want to delve deeper into Objects? You will find a great list at r-project.org
Ready to learn how to use R? You are eligible for a 25% on the online course, Introduction to R for Social Scientists. Use discount code BEGINR before September 17th to claim your discount.
By the end of the course you will:
Have a good understanding of how R works
Be able to perform a wide range of data management tasks, with a focus on solving day-to-day conundrums that we all face as social scientists
Have the knowledge and skills to apply an extensive set of data exploratory and visualization techniques
Be able to use R to perform some of the most common statistical techniques used in the social sciences, namely a dimension reduction technique and OLS regression with interactions
Hear more from Andreea about what you'll learn in the course: