logo

... annoy: benchmark



Notes

Here's the benchmark code for the video.

from sklearn.neighbors import NearestNeighbors

# this is the original query 
query = np.array([-2., -2.])
# scikit learn needs it to be wrapped
q = np.array([query])
# we will retreive 10 neighbors in each case
n = 10

Scikit-Learn Balltree

This code builds the object from scikit-learn.

nn = NearestNeighbors(n_neighbors=n, algorithm='ball_tree').fit(vecs)

Here we time the retreival.

%%timeit 
distances, indices = nn.kneighbors(q)

Scikit-Learn KD-tree

This code builds the object from scikit-learn.

nn = NearestNeighbors(n_neighbors=n, algorithm='kd_tree').fit(vecs)

Here we time the retreival.

%%timeit 
distances, indices = nn.kneighbors(q)

Scikit-Learn Brute Force

This code builds the object from scikit-learn.

nn = NearestNeighbors(n_neighbors=n, algorithm='brute').fit(vecs)

Here we time the retreival.

%%timeit 
distances, indices = nn.kneighbors(q)

Annoy with 10 trees

This code builds the index.

annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
    annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=10)

Here we time the retreival.

%%timeit 
annoy.get_nns_by_vector(query, n)

Annoy with 1 tree

This code builds the index.

annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
    annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=1)

Here we time the retreival

%%timeit 
annoy.get_nns_by_vector(query, n)

Feedback? See an issue? Something unclear? Feel free to mention it here.



If you want to be kept up to date, consider signing up for the newsletter.