It is easier that you might think to fool yourself with data. It is quantified so there is less bias right? This series of videos shows you an analysis using pandas that demonstrates why this might not be true.
We can calculate the effect of smoking, while keeping the age in mind.
(clean_df .assign(age=lambda d: np.round(d['age'] / 10) * 10) .groupby(['smokes', 'age']) .agg(p=('alive', np.mean)) .reset_index() .pivot(index='age', columns='smokes', values='p') .assign(diff=lambda d: d - d)['diff'].mean())
It seems that we get a +3.45% bonus to living longer if we do not smoke, but it should be said that this effect won't be noticeable unless you are of old age.
Feedback? See an issue? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.