Calmcode - bad labels: disagreement

Finding Bad Labels using Model Disagreement

1 2 3 4 5 6 7 8

Instead of using proba values, you can also calculate the disagreement between a model prediction and the label as a proxy for label confidence.

You can confirm how often the data and the model disagree via;

df.loc[lambda d: d['excitement'] != pipe.predict(X)].shape
# (5315, 37)

In our case, 5315 examples is a bit much to go through, so we used a function to sort these cases. The idea is to sort by confidence value for the correct label. If this confidence value is low, the model disagrees with the training data. This can be an indication of a bad label.

Here's the implementation we used for the confidence measure.

def correct_class_confidence(X, y, mod):
    """
    Gives the predicted confidence (or proba) associated
    with the correct label `y` from a given model.
    """
    probas = mod.predict_proba(X)
    values = []
    for i, proba in enumerate(probas):
        proba_dict = {mod.classes_[j]: v for j, v in enumerate(proba)}
        values.append(proba_dict[y[i]])
    return values

And here's where we use said function to sort for values;

(df
  .assign(confidence=correct_class_confidence(X, y, pipe))
  .loc[lambda d: pipe.predict(d['text']) != d['excitement']]
  [['text', 'excitement', 'confidence']]
  .sort_values("confidence")
  .loc[lambda d: d['excitement'] == 0]
  .head(20))

Here's a list of examples we found that were poorly labelled;

  • I am inexplicably excited by [NAME]. I get so excited by how he curls passes
  • Omg this is so amazing ! Keep up the awesome work and have a fantastic New Year !
  • Hey congrats!! That's amazing, you've done such amazing progress! Hope you have a great day :)

These were all in the top 10. Clearly there's some data missing.