Cleaning Messy Data
Cleaning Messy Data
This course will introduce the fundamentals of cleaning messy data. It will provide a clear understanding about what messy data sets are and why they need to be cleaned, as well as giving lots of practical examples for cleaning data sets.
This course will help learners to:
Recognize when data are messy and require cleaning
Apply cleaning methods to messy datasets
Understand how cleaning messy data contributes to good data management
Perform quality control of data
Language: English
Time to complete: 3 hours
Level: Beginner
Instructor: Dr. Alessandra Vigilante
How to access: Sage Campus is a digital library product. If you are a librarian, find out how to get Sage Campus for your university. If you are faculty, a researcher, or a student, recommend Sage Campus to your library.
Even the most organized person can make mistakes when recording and saving data. At first, datasets can look clean and reproducible but as soon as we try to add more data or use them for analysis or visualization purposes, issues begin to arise, and we find ourselves needing to clean the data! In this module, you will learn what messy data are, and why it’s so important to recognize and clean them as soon as possible (and avoid them in the future!).
Students, researchers and faculty can try all Sage Campus courses today by signing up for a 7-day free trial below. 30-day institutional trials are set up via your institution’s library, so recommend us to your library to request a campus-wide trial.
This course is aimed at all learners who work with large data sets that need to be cleaned and reformatted before processing, from undergraduates to early career researchers.