Calmcode - smoking: cleaning

Cleaning

1 2 3 4 5 6 7

It's always a good idea to clean the dataset before analysing it. In this case we're cleaning it merely to make our analysis easier.

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

df = pd.read_csv("~/Downloads/smoking.csv")

def clean_dataframe(dataf):
    return (dataf
            .assign(alive=lambda d: (d['outcome'] == 'Alive').astype(np.int))
            .assign(smokes=lambda d: (d['smoker'] == 'Yes').astype(np.int)))

clean_df = df.pipe(clean_dataframe)

Now that the dataset is clean we can start with the analysis.