# annoy.

The nearest neighbor problem is very common in data science. It's useful in recommender situations but also with neural embeddings in general. It's an expensive thing to calculate so it is common to calculate approximate distances as a proxy. In python a very likeable tool for this is annoy.

Notes

Here's the benchmark code for the video.

``````from sklearn.neighbors import NearestNeighbors

# this is the original query
query = np.array([-2., -2.])
# scikit learn needs it to be wrapped
q = np.array([query])
# we will retreive 10 neighbors in each case
n = 10
``````

## Scikit-Learn Balltree

This code builds the object from scikit-learn.

``````nn = NearestNeighbors(n_neighbors=n, algorithm='ball_tree').fit(vecs)
``````

Here we time the retreival.

``````%%timeit
distances, indices = nn.kneighbors(q)
``````

## Scikit-Learn KD-tree

This code builds the object from scikit-learn.

``````nn = NearestNeighbors(n_neighbors=n, algorithm='kd_tree').fit(vecs)
``````

Here we time the retreival.

``````%%timeit
distances, indices = nn.kneighbors(q)
``````

## Scikit-Learn Brute Force

This code builds the object from scikit-learn.

``````nn = NearestNeighbors(n_neighbors=n, algorithm='brute').fit(vecs)
``````

Here we time the retreival.

``````%%timeit
distances, indices = nn.kneighbors(q)
``````

## Annoy with 10 trees

This code builds the index.

``````annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape):
annoy.build(n_trees=10)
``````

Here we time the retreival.

``````%%timeit
annoy.get_nns_by_vector(query, n)
``````

## Annoy with 1 tree

This code builds the index.

``````annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape):
``````%%timeit