ray:
performance
Python typically runs all your code on a single core. Even when the program that you're running has components that can easily run in parallel. To make programs faster in parallel scenarios you might want to explore ray. It's not the only tool for this use-case but it's a tool we've come to like.
Notes
Parallelism can help, but consider it after you've made your code fast.
def birthday_experiment(class_size, n_sim=1000):
"""Simulates the birthday paradox. Vectorized = Fast!"""
sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
sort_sims = np.sort(sims, axis=1)
n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
return {"est_prob": np.mean(n_uniq != class_size)}
def birthday_no_numpy(class_size, n_sim=1000):
results = []
for s in range(n_sim):
sims = np.random.randint(1, 365 + 1, class_size)
results.append(len(set(sims)) != class_size)
return {"est_prob": np.mean(results)}
You can confirm the speed of the vectorized variant.
%%timeit
birthday_experiment(10)
Notice, how much slower the other function is.
%%timeit
birthday_no_numpy(10)
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.