Calmcode - dirty cat: similarity

Similarity

1 2 3 4 5

You can play around with the dirt_cat settings below.

import dirty_cat

mod = dirty_cat.SimilarityEncoder(categories='most_frequent', n_prototypes=200)
mod.fit_transform(data[['employee_position_title']]).shape