Dr. Maja Založnik answers your questions on getting started in R — Sage Campus // Replace title block colour with text shadow

In March, we hosted a free Introduction to R webinar, with Dr. Maja Založnik, Research Fellow at the Oxford Institute of Population Ageing, to demystify the wonderful world of the R programming language.

Below, Maja answers listeners’ questions that we didn’t have time to cover in the webinar. Click on the questions of interest below to see Maja’s answers and tips!


Is R better than Python or STATA for sociological analysis?

There are benefits to both the R and Python language – both of which are popular among data analysts and scientists. Check out this blog on R vs. Python for the long answer! Comparing it to STATA the most obvious difference is that R is free and STATA is not and that R requires some programming skills. All three should be powerful enough for most applications of sociological analyses, but R’s flexibility is what makes it worth it!

Are Python skills transferable to R?

They are: programming skills take some tweaking, but are a very transferable skill regardless of the programming language. Additionally, the open source environment and community that Python also boasts will feel familiar, as well as both languages’ focus on reproducible research.

Is R a good tool for web scraping or are other programmes (e.g. Python) better?

Although I don't personally have experience with Python for web scraping, my guess is both have very similar functionality, so you should stick with the one you are more comfortable with, and not switch for this reason alone.

How do I get R in order to install it on my computer?

You can download R for free here. All technical questions, such as compatibility with different devices, can be found on their FAQ pages.

Based on your experience, how long does it take to learn to use R?

You can pick up the basic programming skills fairly quickly. For example, the Campus Introduction to R course takes 12 hours to complete – and by the end of that you’ll be able to carry out common statistical analyses. However, it takes continued practice to master R.

How do you learn to code in R or is there a data set stored in R?

The best way to learn to code is to do just that: code. There are numerous online resources, tutorials, videos and courses for you to choose from and follow. And yes, R already comes with some classic datasets attached that are commonly used in learning environments. I would recommend that whichever course you decide to follow that you also try to apply what you are learning to a dataset of your own, not just the one provided.

How did you learn the language for all the instructions (the notations)?

The answer is simply: learning by doing. There aren't really any shortcuts, but you'll become most proficient in R (as in any programming language) by putting what you learn into practice on your own data.

How can first-timers familiar themselves with the code, keywords, and formula? For example, is there a list of all the commands and are all available codes accessed through ‘help’?

This is a very good question and touches on one of the frustrations on many new R users: getting to grips with the myriad of functions that exist in the R environment.

When you search through the help tab in Rstudio, you will only get results for packages you have loaded, not for ones you haven't. This may seem unfortunate, but if you think of all the tens of thousands of packages and even more functions, often with overlapping functionality, it would be quite overwhelming to get all of those results every time.

For a new user, I recommend working through examples like on the Campus course, where you can see approaches worked out in practice before applying them to your own data. And remember, there is never only one way to do something in R, and often the difference in which package or which function used is just a matter of style, personal preference or path dependence. If you're a proficient ‘Google-r’ you will often be able to find blog posts comparing different functions and approaches and giving detailed reason explanations for how they differ and their recommended use.

What’s the best R package for beginners to conduct data visualization?

I would probably recommend you go straight to ggplot2 for data visualization. It is not my go to package, because I “grew up” on R base graphics and always found ggplot a bit difficult to get into, but I recognize that it is now the most powerful graphing package. Furthermore, it plays very nicely with the tidyverse family of packages, which have become indispensable. I myself am slowly but surely migrating to ggplot as well.

Are the packages created by individual users or does R come with built-in packages?

Both! First of all R comes with enough built-in packages to allow for a more than reasonable amount of data analysis and visualizations out of the box. Additional packages are written by individual users, usually when they have written their own functions for additional functionality and they wish to re-use them: so why not make them available to others?

These packages come in two general flavors: officially sanctioned ones that live in the CRAN repository – these have complete documentation and have passed onerous tests to be able to be an official R package – and other packages that users make available online (usually on GitHub), with the idea that someone else might find them useful.

Do you have to share your data (i.e. open source) when using R?

No, there is no obligation to share your data so you are completely free to work on proprietary data.

How do you save the analysis you showed us in the webinar?

The analysis, including all code for importing data and packages, the analysis and data visualization command, as well as all the comments explaining what is happening, are all saved in a simple text file, which has the extension .R to allow Rstudio to recognize it.

How can dynamic plots be copied to a presentation package like PowerPoint?

If by dynamic you mean animations, these are usually in gif format and can be imported directly into PowerPoint. If instead you mean interactive charts, then I'm afraid they require an instance of R to work. This can take place either via a server as in when you have a shiny app running via shinyapps, which means you can access it online i.e. via a browser, or you can run them directly from your laptop, in Rstudio.

What is the process for importing data?

This depends on the data source, whether or not you have the file locally, or you need to download it from the web, and in what format it is. In most cases of proprietary formats you will need to load a specialized package e.g. for Excel I recommend readxl, but in case of .csv and other text files you can import it directly using the function read.csv.

Is there a limit on the number of variables you can have in R? For example, some data sets have over 1,000 variables!

No, there is no such limit on the number of variables. However there is a limit on the number of ‘cases’ i.e. rows, and that depends on the amount of RAM on your computer. Still, even those limitations can be overcome by specialized packages for memory intensive analyses.

Is R analyses shareable with other colleagues not working with R? e.g. through R Markdown perhaps?

That's a great question, and you've answered it already! Yes, rmarkdown allows you to ‘knit’ together your analysis (including the actual code if you wish) and all the results and charts, combining it with normal formatted text. These can be compiled to pdf, html or even MS Word files to share easily with non-R-users.

What is the process for importing data?

This depends on the data source, whether or not you have the file locally, or you need to download it from the web, and in what format it is. In most cases of proprietary formats you will need to load a specialized package e.g. for Excel I recommend readxl, but in case of .csv and other text files you can import it directly using the function read.csv.

How should categorical variables be coded?

In R, categorical variables are coded as ‘factors’. This way R knows to treat them differently in certain types of analyses as well as allowing you to add labels.

How do you extract graphs to use in papers or other documents?

In Rstudio you can export plots directly from the plotting pane, which will save them as .png files. But you are not limited to that, you can directly plot files to other ‘devices’ as they are termed, including .pdf and .eps for vector graphics, as well as most other traditional image files.

Is there a limit on the number of variables you can have in R? For example, some data sets have over 1,000 variables!

No, there is no such limit on the number of variables. However there is a limit on the number of ‘cases’ i.e. rows, and that depends on the amount of RAM on your computer. Still, even those limitations can be overcome by specialized packages for memory intensive analyses.

Is R analyses shareable with other colleagues not working with R? e.g. through R Markdown perhaps?

That's a great question, and you've answered it already! Yes, rmarkdown allows you to ‘knit’ together your analysis (including the actual code if you wish) and all the results and charts, combining it with normal formatted text. These can be compiled to pdf, html or even MS Word files to share easily with non-R-users.


Want to have SAGE Campus available at your institution? Find out more and contact us about institutional purchase of our courses.

Useful links

Comment

Share