The nearest neighbor problem is very common in data science. It's useful in recommender situations but also with neural embeddings in general. It's an expensive thing to calculate so it is common to calculate approximate distances as a proxy. In python a very likeable tool for this is annoy.
You'll first need to install annoy via;
pip install annoy
Once installed you can run the code from the video below.
This code generates the random data.
import numpy as np import matplotlib.pylab as plt from annoy import AnnoyIndex columns = 2 vecs = np.concatenate([ np.random.normal(-1, 1, (5000, columns)), np.random.normal(0, 0.5, (5000, columns)), ]) plt.scatter(vecs[:, 0], vecs[:, 1], s=1);
This code generates the annoy index.
annoy = AnnoyIndex(columns, 'euclidean') for i in range(vecs.shape): annoy.add_item(i, vecs[i, :]) annoy.build(n_trees=1)
This code fetches the indices of the neighbors;
indices = annoy.get_nns_by_vector(np.array([0., 0.]), 20)
Feedback? See an issue? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.