birthday problem:
dataset
There's a famous probability exercise called the birthday problem. It's an interesting problem for sure, but typically it is calculating the wrong thing. In this series of videos we'll explore what this means.
Notes
To download the data locally either download here or run the following command from the command line.
wget https://calmcode.io/datasets/birthdays.csv
Here's the code used in this video.
import pandas as pd
df = pd.read_csv("birthdays.csv")
plot_df = (df
.assign(date = lambda d: pd.to_datetime(d['date']))
.assign(day_of_year = lambda d: d['date'].dt.dayofyear)
.groupby('day_of_year')
.agg(n_births=('births', 'sum'))
.assign(p = lambda d: d['n_births']/d['n_births'].sum()))
probabilities = plot_df['p']
probabilities.plot();
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.