birthday problem: dataset
There's a famous probability exercise called the birthday problem. It's an interesting problem for sure, but typically it is calculating the wrong thing. In this series of videos we'll explore what this means.
To download the data locally either download here or run the following command from the command line.
Here's the code used in this video.
import pandas as pd df = pd.read_csv("birthdays.csv") plot_df = (df .assign(date = lambda d: pd.to_datetime(d['date'])) .assign(day_of_year = lambda d: d['date'].dt.dayofyear) .groupby('day_of_year') .agg(n_births=('births', 'sum')) .assign(p = lambda d: d['n_births']/d['n_births'].sum())) probabilities = plot_df['p'] probabilities.plot();
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.