There's a famous probability exercise called the birthday problem. It's an interesting problem for sure, but typically it is calculating the wrong thing. In this series of videos we'll explore what this means.
Here's the code that makes the plot appear.
import pandas as pd df = pd.read_csv("birthdays.csv") plot_df = (df .assign(date = lambda d: pd.to_datetime(d['date'])) .assign(day_of_year = lambda d: d['date'].dt.dayofyear) .groupby('day_of_year') .agg(n_births=('births', 'sum')) .assign(p = lambda d: d['n_births']/d['n_births'].sum())) plot_df.assign(p_fake = lambda d: 1/d.shape)[['p', 'p_fake']].plot() plt.ylim(0);
We hope you enjoyed the little thought experiment.
If you want to download the entire notebook, feel free to grab it here.
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.