birthday problem logo birthday problem: dataset

1 2 3 4 5 6
Notes

To download the data locally either download the file directly or run the following command from the command line.

wget https://calmcode.io/datasets/birthdays.csv

Here's the code used in this video.

import pandas as pd

df = pd.read_csv("birthdays.csv")

plot_df = (df
  .assign(date = lambda d: pd.to_datetime(d['date']))
  .assign(day_of_year = lambda d: d['date'].dt.dayofyear)
  .groupby('day_of_year')
  .agg(n_births=('births', 'sum'))
  .assign(p = lambda d: d['n_births']/d['n_births'].sum()))

probabilities = plot_df['p']
probabilities.plot();