... ray: performance


Parallelism can help, but consider it after you've made your code fast.

def birthday_experiment(class_size, n_sim=1000):
    """Simulates the birthday paradox. Vectorized = Fast!"""
    sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
    sort_sims = np.sort(sims, axis=1)
    n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
    return {"est_prob": np.mean(n_uniq != class_size)}

def birthday_no_numpy(class_size, n_sim=1000):
    results = []
    for s in range(n_sim):
        sims = np.random.randint(1, 365 + 1, class_size)
        results.append(len(set(sims)) != class_size)
    return {"est_prob": np.mean(results)}

You can confirm the speed of the vectorized variant.


Notice, how much slower the other function is.


Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider signing up for the newsletter.