Calmcode - bad labels: probas

Finding Bad Labels using Confidence Scores

1 2 3 4 5 6 7 8

The first trick revolves around using the .predict_proba() method as a proxy for model confidence. These proba values don't actually serve as a proper measure for confidence, but they might be good enough to generate short-lists of items to double-check. For more information on model confidence we recommend reading this blogpost.

Proba

To generate proba values, you can take a pretrained pipeline and run;

pipe.predict_proba(X)
# array([[0.81905624, 0.18094376],
#        [0.87339587, 0.12660413],
#        [0.99887526, 0.00112474],
#        ...,
#        [0.95765091, 0.04234909],
#        [0.89402035, 0.10597965],
#        [0.97989268, 0.02010732]])

This gives us a two dimenional array with two columns (one for each class). Since each row needs to sum up to one, we can take a single column to check how certain the model is in it's prediction.

# make predictions
probas = pipe.predict_proba(X)[:, 0]

# use predictions in hindsight, note that
# probas.shape[0] == df.shape[0]
(df
  .loc[(probas > 0.45) & (probas < 0.55)]
  [['text', 'excitement']]
  .head(7))

By running this, you'll find the example "OMG THOSE TINY SHOES! desire to boop snoot intensifies" which is wrongly labelled as not excitement.