It is easier that you might think to fool yourself with data. It is quantified so there is less bias right? This series of videos shows you an analysis using pandas that demonstrates why this might not be true.
You can find a copy of the final notebook here.
You can get a substantially different conclusion to the same data if part of it is not taken into account. This is one of the things that makes working with data super hard to get right. Our saving grace here was critical thinking (always double-check).
There are many aspects that we might also want to check;
Do people stop smoking after a while, what is this effect?
Does the effect of smoking depend on gender?
Can we distinguish between heavy smokers and light ones?
We should be very careful. We don't want to be the one who is lying with statistics.
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.