# annoy.

The nearest neighbor problem is very common in data science. It's useful in recommender situations but also with neural embeddings in general. It's an expensive thing to calculate so it is common to calculate approximate distances as a proxy. In python a very likeable tool for this is annoy.

**Episode Notes**

Here's the benchmark code for the video.

```
from sklearn.neighbors import NearestNeighbors
# this is the original query
query = np.array([-2., -2.])
# scikit learn needs it to be wrapped
q = np.array([query])
# we will retreive 10 neighbors in each case
n = 10
```

## Scikit-Learn Balltree

This code builds the object from scikit-learn.

```
nn = NearestNeighbors(n_neighbors=n, algorithm='ball_tree').fit(vecs)
```

Here we time the retreival.

```
%%timeit
distances, indices = nn.kneighbors(q)
```

## Scikit-Learn KD-tree

This code builds the object from scikit-learn.

```
nn = NearestNeighbors(n_neighbors=n, algorithm='kd_tree').fit(vecs)
```

Here we time the retreival.

```
%%timeit
distances, indices = nn.kneighbors(q)
```

## Scikit-Learn Brute Force

This code builds the object from scikit-learn.

```
nn = NearestNeighbors(n_neighbors=n, algorithm='brute').fit(vecs)
```

Here we time the retreival.

```
%%timeit
distances, indices = nn.kneighbors(q)
```

## Annoy with 10 trees

This code builds the index.

```
annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=10)
```

Here we time the retreival.

```
%%timeit
annoy.get_nns_by_vector(query, n)
```

## Annoy with 1 tree

This code builds the index.

```
annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=1)
```

Here we time the retreival

```
%%timeit
annoy.get_nns_by_vector(query, n)
```

Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.