birthday problem:
true simulation
There's a famous probability exercise called the birthday problem. It's an interesting problem for sure, but typically it is calculating the wrong thing. In this series of videos we'll explore what this means.
Notes
The code below also contains code from the previous video, but it should give you the correct chart. Be mindful that the simulation might take a bit.
import pandas as pd
df = pd.read_csv("birthdays.csv")
plot_df = (df
.assign(date = lambda d: pd.to_datetime(d['date']))
.assign(day_of_year = lambda d: d['date'].dt.dayofyear)
.groupby('day_of_year')
.agg(n_births=('births', 'sum'))
.assign(p = lambda d: d['n_births']/d['n_births'].sum()))
def sim_real_once(room = 20):
r = np.random.choice(probabilities.index, p=probabilities, size=room)
return np.unique(r).shape[0] != room
def simulate_real(room = 20, n_sim = 1000):
return np.mean([sim_real_once(room) for _ in range(n_sim)])
plt.plot([calculate(r) for r in range(1, 35)], label="calculated")
plt.plot([simulate_real(room = i, n_sim=10_000) for i in range(1, 35)], label="simulated_real")
plt.legend();
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.