birthday problem: conclusion
There's a famous probability exercise called the birthday problem. It's an interesting problem for sure, but typically it is calculating the wrong thing. In this series of videos we'll explore what this means.
Here's the code that makes the plot appear.
import pandas as pd df = pd.read_csv("birthdays.csv") plot_df = (df .assign(date = lambda d: pd.to_datetime(d['date'])) .assign(day_of_year = lambda d: d['date'].dt.dayofyear) .groupby('day_of_year') .agg(n_births=('births', 'sum')) .assign(p = lambda d: d['n_births']/d['n_births'].sum())) plot_df.assign(p_fake = lambda d: 1/d.shape)[['p', 'p_fake']].plot() plt.ylim(0);
We hope you enjoyed the little thought experiment.
If you want to download the entire notebook, feel free to grab it here.
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.