At SAGE Campus we’re always keen to hear how researchers are using computational methods. We spoke to Nicole Rae Baerg one of our social science experts on Fundamentals of Quantitative Text Analysis for Social Scientists and asked her a few questions about her work, and which tools she recommends to others.


What inspired you to do what you do?

I started doing text analysis and machine learning while writing my dissertation. This was because I had too much textual information, in my case, newspaper articles, that I wanted to process. I started doing it by hand and thought, there has got to be another way to do this! The demand was one part laziness on my part and another part commitment to reproducible research. It then opened a Pandora's box of cool possibilities.  


In your opinion which data science tools provide the most value to researchers?

The tool that I have used over the last few years that changed my own research, and my collaborations with co-authors and students, is Overleaf. I use it for collaborative writing and I increasingly use it for publishing and working with academic journals as they join along too. The thing that I should use more and do not is github. It's the tool that I know I should focus on learning better and using more often in my research.

What data science skills should researchers be building on to boost their research?

I think there needs to be more attention paid to substantive questions and research design. Just because you have a lot of data does not mean that you have the correct data or even useful data for your question or task. I strongly favour theoretically informed analysis. This doesn't only have to be explanatory analysis. It can also be descriptive. But there needs to be theory. That being said, I also wish that there were more "hackathons" in the social sciences.  


What’s your favourite data science tool and why?

R markdown documents. Or ipython notebooks. Same idea. I use them for teaching and for writing up my replication material. Students like them because they feel like they get materials that they can play with at home. Co-authors like them because they can see exactly what I have done without having to read my R files which more likely have function titles like "Nicole's_est_try3pool_withse." I'm also really good at naming things with descriptive and useful names like "plot.pdf." Not particularly useful to an outside reader, so the rmd files are really helpful!


What do you wish you could do differently?  

I wish I was a better computer programmer in a generic  "language free" sense. By that I mean, I wish that I could write computer code more abstractly rather than know how to write something in R and write something else in Python. Computer languages come into and go out of fashion; knowing how to program in a more abstract way is a really great skill.

The next cohort of Fundamentals of Quantitative Text Analysis for Social Scientists starts on June 25th. Find out more and sign up here