Calmcode - annoy: benchmark

Benchmark Annoy Performance against Scikit-Learn

1 2 3 4 5 6 7 8

Here's the benchmark code for the video.

from sklearn.neighbors import NearestNeighbors

# this is the original query
query = np.array([-2., -2.])
# scikit learn needs it to be wrapped
q = np.array([query])
# we will retreive 10 neighbors in each case
n = 10

Scikit-Learn Balltree

This code builds the object from scikit-learn.

nn = NearestNeighbors(n_neighbors=n, algorithm='ball_tree').fit(vecs)

Here we time the retreival.

%%timeit
distances, indices = nn.kneighbors(q)

Scikit-Learn KD-tree

This code builds the object from scikit-learn.

nn = NearestNeighbors(n_neighbors=n, algorithm='kd_tree').fit(vecs)

Here we time the retreival.

%%timeit
distances, indices = nn.kneighbors(q)

Scikit-Learn Brute Force

This code builds the object from scikit-learn.

nn = NearestNeighbors(n_neighbors=n, algorithm='brute').fit(vecs)

Here we time the retreival.

%%timeit
distances, indices = nn.kneighbors(q)

Annoy with 10 trees

This code builds the index.

annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
    annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=10)

Here we time the retreival.

%%timeit
annoy.get_nns_by_vector(query, n)

Annoy with 1 tree

This code builds the index.

annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
    annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=1)

Here we time the retreival

%%timeit
annoy.get_nns_by_vector(query, n)