Data cleansing steps

Remove unnecessary columns

Pandas offers two functions for handeling missing data: ` isnull() and notnull`. These return a Boolean value to show if the passed value is missing data.

You can either replace them or drop them.

In pandas missing values show up as nan values. (not a number).

dropna() drops all nan values from a dataframe instead of drop all those values you can call fillna() to change the nan values with values you specifiy.

Identify and remove duplicates

duplicated() functions finds duplicate values in a series. drop_duplicates() removes the duplicates.

Fix missing data

If you need to alter the column used in an index, you can use set_index().

set_index() allows you to change the column or columns you want to be the index column.

See example of data cleaning with the follwoing notebook.