ray logo ray: performance

1 2 3 4 5
Notes

Parallelism can help, but consider it after you've made your code fast.

def birthday_experiment(class_size, n_sim=1000):
    """Simulates the birthday paradox. Vectorized = Fast!"""
    sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
    sort_sims = np.sort(sims, axis=1)
    n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
    return {"est_prob": np.mean(n_uniq != class_size)}


def birthday_no_numpy(class_size, n_sim=1000):
    results = []
    for s in range(n_sim):
        sims = np.random.randint(1, 365 + 1, class_size)
        results.append(len(set(sims)) != class_size)
    return {"est_prob": np.mean(results)}

You can confirm the speed of the vectorized variant.

%%timeit
birthday_experiment(10)

Notice, how much slower the other function is.

%%timeit
birthday_no_numpy(10)