Data cleansing steps
Remove unnecessary columns
Pandas offers two functions for handeling missing data:
` isnull() and notnull`.
These return a Boolean value to show if the passed value is missing data.
You can either replace them or drop them.
In pandas missing values show up as nan values. (not a number).
dropna() drops all nan values from a dataframe
instead of drop all those values you can call fillna() to change the nan values with values you specifiy.
Identify and remove duplicates
duplicated() functions finds duplicate values in a series.
drop_duplicates() removes the duplicates.
Fix missing data
If you need to alter the column used in an index, you can use set_index().
set_index() allows you to change the column or columns you want to be the index column.
See example of data cleaning with the follwoing notebook.