annoy:
plotting
The nearest neighbor problem is very common in data science. It's useful in recommender situations but also with neural embeddings in general. It's an expensive thing to calculate so it is common to calculate approximate distances as a proxy. In python a very likeable tool for this is annoy.
Notes
All the code below will generate the same plot;
import numpy as np
import matplotlib.pylab as plt
from annoy import AnnoyIndex
columns = 2
vecs = np.concatenate([
np.random.normal(-1, 1, (5000, columns)),
np.random.normal(0, 0.5, (5000, columns)),
])
annoy = AnnoyIndex(columns, 'euclidean')
for i in range(vecs.shape[0]):
annoy.add_item(i, vecs[i, :])
annoy.build(n_trees=1)
plt.figure(figsize=(5, 5))
plt.scatter(vecs[:, 0], vecs[:, 1], s=1);
indices = annoy.get_nns_by_vector(np.array([-1., -1.]), 2000)
subset = vecs[indices, :]
plt.scatter(subset[:, 0], subset[:, 1], s=1);
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.