Numbers on a spreadsheet

Data cleaning is done once all data have been collected and entered, before data analysis begins. It is the process of removing or modifying data that is incorrect, incomplete, duplicated, or not relevant. Removing or modifying inaccurate data is important so that it does not hinder the data analysis process or skew results. 

Be aware that the data cleaning stage, like all stages of data collection and analysis, is not immune to the effects of human error and implicit bias. Read more about inequity and bias in data science at We All Count.