smoking:
smoking is bad
It is easier that you might think to fool yourself with data. It is quantified so there is less bias right? This series of videos shows you an analysis using pandas that demonstrates why this might not be true.
Notes
When we normalise against age, suddenly we see another pattern.
(clean_df
.assign(age=lambda d: np.round(d['age'] / 10) * 10)
.groupby(['smokes', 'age'])
.agg(p=('alive', np.mean))
.reset_index()
.pivot(index='age', columns='smokes', values='p')
.plot())
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.