dirty cat: similarity
When you're working with scikit-learn you'll often need to deal with categorical data. The way you deal with this type of data really matters. In this series of videos we'll explore a the dirty-cat while we try to deal with categorical data.
You can play around with the
dirt_cat settings below.
import dirty_cat mod = dirty_cat.SimilarityEncoder(categories='most_frequent', n_prototypes=200) mod.fit_transform(data[['employee_position_title']]).shape
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.