... smoking: cleaning


It's always a good idea to clean the dataset before analysing it. In this case we're cleaning it merely to make our analysis easier.

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

df = pd.read_csv("~/Downloads/smoking.csv")

def clean_dataframe(dataf):
    return (dataf
            .assign(alive=lambda d: (d['outcome'] == 'Alive').astype(np.int))
            .assign(smokes=lambda d: (d['smoker'] == 'Yes').astype(np.int)))

clean_df = df.pipe(clean_dataframe)

Now that the dataset is clean we can start with the analysis.

Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider signing up for the newsletter.