Gabe Ignatow.jpg

Hi, my name is Gabe Ignatow. My colleague, Rada Mihalcea and I are proud to have co-authored the online course Introduction to Text Mining for Social Scientists. Both together and individually, Rada and I have conducted many text mining studies that we have published as research articles, books, and book chapters. Together we have about 40 years of experience using text mining tools, and we have both taught graduate-level text mining courses for many years. it’s an exciting field that encompasses new research methods and software tools that are being used across academia as well as by companies and government agencies.

What is text mining?

Let me start by defining text mining: Text mining involves the collection and analysis of the textual data that groups, communities, and organizations generate - sometimes intentionally, but often as byproducts of their everyday social interactions. Text mining is an accessible, affordable and powerful form of data mining.

Many people are interested in learning more about text mining because of the significant advantages it has over traditional social science research methods such as surveys and interviews. The internet provides researchers with an incredible variety of sources of unprompted user data from social media platforms, news sites, streaming video sites, and digital archives. At the same time, a large number of software tools have been developed to assist researchers in the acquisition and analysis of digital text data.

What are text mining tools used for?

Researchers working at universities as well as in the private and public sectors are using text mining tools for all sorts of projects. Analysts have used text mining tools to study social media in order to predict the direction of stock markets and the occurrence of political protest. They use these tools to predict the gender, age, and emotional state of social media users and all sorts of other people who leave digital traces of their daily activities. Researchers analyze user comments on online newspaper articles and streaming video sites. They use text mining tools to analyze consumer product markets and consumer opinion, and to speed information searchers, for example on library websites. They analyze single works of literature as well as collections of literary works representative of historical periods or literary genres. They have analyzed transcripts of the everyday social interactions of nurses, teachers, counselor, students, and employees of large organizations. They analyze social mood and social attitudes by using text mining tools to probe the topics people talk about, the feeling words they use, the stories they tell, and the metaphors they tend to use.


Where do I start?

Computational research methods such as text mining can be intimidating for users who approach these methods from a variety of backgrounds and with diverse skill sets. To provide an accessible and practical introduction to text mining, we have designed an online course for students and research professionals who are interested in learning more about text mining but are not quite sure where to start. Designed by a sociologist and a computer scientist, our course is necessarily interdisciplinary. It also has an à la carte feel, as it was designed to provide students with a menu from which they can choose the lessons that are most important and valuable for them, and then come back to other lessons when the time is right. The course does not require programming experience or a background in statistics.

This course is based on our textbooks, Text Mining and Introduction to Text Mining, but we have streamlined the lessons from those books into five modules:

  1. Foundations

  2. Research Design and Basic Tools

  3. Text Mining Fundamentals

  4. Methods from the Humanities and Social Sciences

  5. Computer Science Methods

What is covered in the online course?

In Foundations we cover the foundational concepts involved in text mining. Specifically, we define text mining and discuss how it relates to text analysis. We then discuss how to acquire textual data that you can use for your own research project. Finally, we help you to identify appropriate ethical guidelines for your research project and to consider several philosophical issues that will help you to define your project and what you can expect to learn from it.

Research Design and Basic Tools
The module Research Design and Basic Tools includes two main topics. The first, research design, will be vitally important to you if you are working on your own research project. The lessons on basic tools presented in this module are important for all text mining students.

Text Mining Fundamentals
In Text Mining Fundamentals we cover the fundamental principles and procedures for text mining as developed in computational linguistics. This is a technical module covering some of the main concepts of Natural Language Processing (NLP). If you are interested in learning NLP you will benefit a great deal from the lessons in this module. If you are focused on your own social science research project, you should learn the basics of NLP but you can be more selective about which lessons in this module will be most valuable.

Methods from the Humanities and Social Sciences
The fourth module, Methods from the Humanities and Social Sciences, provides a smorgasbord of tools and techniques for text mining and text analysis. These tools and techniques have been developed by scholars in the humanities and social sciences, and they have all proven their value in large numbers of published empirical research projects—books, journal articles, reports, etc.

Computer Science Methods
Finally, in Computer Science Methods we do a deep dive into the major text mining procedures developed by computer scientists and other researchers working in Natural Language Processing (NLP). These include automated procedures for classifying texts, measuring sentiments and opinions expressed in texts, extracting information from texts, and analyzing the topics discussed within texts. This module contains the most technical lessons of the course, so be sure to take your time to gain as much as possible from each lesson.

If you are interested in working with text mining tools and want a broad overview of the field, this is a very useful course for you. We hope you consider signing up!

–Gabe Ignatow and Rada Mihalcea

Read the syllabus for Introduction to Text Mining for Social Scientists