logo

... birthday problem.


There's a famous probability exercise called the birthday problem. It's an interesting problem for sure, but typically it is calculating the wrong thing. In this series of videos we'll explore what this means.


Notes

To download the data locally either download here or run the following command from the command line.

wget https://calmcode.io/datasets/birthdays.csv

Here's the code used in this video.

import pandas as pd

df = pd.read_csv("birthdays.csv")

plot_df = (df
  .assign(date = lambda d: pd.to_datetime(d['date']))
  .assign(day_of_year = lambda d: d['date'].dt.dayofyear)
  .groupby('day_of_year')
  .agg(n_births=('births', 'sum'))
  .assign(p = lambda d: d['n_births']/d['n_births'].sum()))

probabilities = plot_df['p']
probabilities.plot();

Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.