What you'll learn

Cleaning Messy Data

SCROLL DOWN

What you'll learn

Who it's for

Other courses

What you'll learn

Cleaning Messy Data

This course will introduce the fundamentals of cleaning messy data. It will provide a clear understanding about what messy data sets are and why they need to be cleaned, as well as giving lots of practical examples for cleaning data sets.

This course will help learners to:

Recognize when data are messy and require cleaning
Apply cleaning methods to messy datasets
Understand how cleaning messy data contributes to good data management
Perform quality control of data

Language: English

Time to complete: 3 hours

Level: Beginner

Instructor: Dr. Alessandra Vigilante

How to access: Sage Campus is a digital library product. If you are a librarian, find out how to get Sage Campus for your university. If you are faculty, a researcher, or a student, recommend Sage Campus to your library.

Course modules

There are 3 modules in this course:

1. Help! My Data Are Messy

Even the most organized person can make mistakes when recording and saving data. At first, datasets can look clean and reproducible but as soon as we try to add more data or use them for analysis or visualization purposes, issues begin to arise, and we find ourselves needing to clean the data! In this module, you will learn what messy data are, and why it’s so important to recognize and clean them as soon as possible (and avoid them in the future!).

2. Why Clean Messy Data?

3. How Can I Clean My Messy Data?

Download full syllabus

Try the course

Try it out

Students, researchers and faculty can try all Sage Campus courses today by signing up for a 7-day free trial below. 30-day institutional trials are set up via your institution’s library, so recommend us to your library to request a campus-wide trial.

7-day free trial

Who it's for

Who it’s for

This course is aimed at all learners who work with large data sets that need to be cleaned and reformatted before processing, from undergraduates to early career researchers.

Other courses

Browse our other data science skills courses

Click the arrows to browse our data science skills courses

Introduction to R

This practical course will help you gain the knowledge and skills to use R for social science research, step-by-step.

20 hrs to complete

Practical Data Management with R

Learn how to use R to manage data in a wide variety of formats, in a reproducible manner, at scale.

30 hrs to complete

Introduction to Python

Perfect for beginners, this course will teach you the fundamentals of Python programming through taught materials and practical examples.

24 hrs to complete

Intermediate Python Skills

Gain the skills you need to manipulate and visualize a variety of data types using Python.

25 hrs to complete

Interactive Visualization with R

Learn the techniques and tools for presenting data in visually attractive and interactive ways using the R programming language.

40 hrs to complete

Collecting Social Media Data

Learn the essentials of collecting social media data and gain the skills to plan, gather and analyze social media data for your research.

5 hrs to complete

Introduction to Text Mining

Gain a conceptual overview of the text mining landscape and a foundational understanding of the analysis of digital textual data sets.

10 hrs to complete

Fundamentals of Quantitative Text Analysis

Learn how to analyze large amounts of textual data, at scale, using the R programming language.

15 hrs to complete

Introduction to Artificial Intelligence

Gives learners a full understanding of what artificial intelligence is and how it is used and applied in society and research methods, covering important ethical considerations and challenges.

2 hrs to complete

Cleaning Messy Data

Learn the fundamentals of cleaning messy data. This course will provide a clear understanding about what messy data sets are and why they need to be cleaned, as well as giving lots of practical examples for cleaning data sets.

3 hrs to complete

Browse all courses

Settings

Featured

Dr. Alessandra Vigilante

Dr. Alessandra Vigilante is a Senior Lecturer in Bioinformatics at the Center for Stem Cells and Regenerative Medicine with a focus on genotype-phenotype interactions and data integration. Alessandra obtained her PhD in Bioinformatics in Naples (2008-2011) before moving to the UK to join the Nicholas Luscombe group first at the EMBL-European Bioinformatics Institute as a visiting student (2011-2012) and then as a postdoctoral fellow at UCL (2012-2017).

Alessandra Vigilante’s group has significant expertise and experience in the analysis and integration of large scale genomic, epigenomic and transcriptomic data (i.e. single-cell RNA-seq and ATAC-seq datasets, ChIP-seq etc…), and in the implementation of novel computational methods for various bespoke analyses to gain biological insights.She is actively involved in a great network of collaborations to develop multidisciplinary approaches to research efforts, working with faculty members within King’s and other research institutes.

[
{"navLabel":"What you'll learn",
"navSection": 1
},
{"navLabel":"Register interest",
"navSection": 2
},
{"navLabel":"Who it's for",
"navSection": 3
},
{"navLabel":"Other courses",
"navSection": 4
}
]