dirty cat logo dirty cat: ngram

1 2 3 4 5
Notes

Let's change the analyzer and ngram_range parameters.

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(analyzer='char', ngram_range=(2, 4))
cv.fit(ml_df['employee_position_title'])
cv.transform(ml_df['employee_position_title']).shape

You can also inspect the vocabulary.

cv.vocabulary_