dirty cat: ngram
When you're working with scikit-learn you'll often need to deal with categorical data. The way you deal with this type of data really matters. In this series of videos we'll explore a the dirty-cat while we try to deal with categorical data.
Let's change the
from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer(analyzer='char', ngram_range=(2, 4)) cv.fit(ml_df['employee_position_title']) cv.transform(ml_df['employee_position_title']).shape
You can also inspect the vocabulary.
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.