Here's the benchmark code for the video.
from sklearn.neighbors import NearestNeighbors
# this is the original query
query = np.array([-2., -2.])
# scikit learn needs it to be wrapped
q = np.array([query])
# we will retreive 10 neighbors in each case
n = 10
Scikit-Learn Balltree
This code builds the object from scikit-learn.
nn = NearestNeighbors(n_neighbors=n, algorithm='ball_tree').fit(vecs)
Here we time the retreival.
%%timeit
distances, indices = nn.kneighbors(q)
Scikit-Learn KD-tree
This code builds the object from scikit-learn.
nn = NearestNeighbors(n_neighbors=n, algorithm='kd_tree').fit(vecs)
Here we time the retreival.
%%timeit
distances, indices = nn.kneighbors(q)
Scikit-Learn Brute Force
This code builds the object from scikit-learn.
nn = NearestNeighbors(n_neighbors=n, algorithm='brute').fit(vecs)
Here we time the retreival.
%%timeit
distances, indices = nn.kneighbors(q)
Annoy with 10 trees
This code builds the index.
annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=10)
Here we time the retreival.
%%timeit
annoy.get_nns_by_vector(query, n)
Annoy with 1 tree
This code builds the index.
annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=1)
Here we time the retreival
%%timeit
annoy.get_nns_by_vector(query, n)