logo

... birthday problem: dataset



Notes

To download the data locally either download here or run the following command from the command line.

wget https://calmcode.io/datasets/birthdays.csv

Here's the code used in this video.

import pandas as pd

df = pd.read_csv("birthdays.csv")

plot_df = (df
  .assign(date = lambda d: pd.to_datetime(d['date']))
  .assign(day_of_year = lambda d: d['date'].dt.dayofyear)
  .groupby('day_of_year')
  .agg(n_births=('births', 'sum'))
  .assign(p = lambda d: d['n_births']/d['n_births'].sum()))

probabilities = plot_df['p']
probabilities.plot();

Feedback? See an issue? Something unclear? Feel free to mention it here.



If you want to be kept up to date, consider signing up for the newsletter.