Even famous datasets have bad labels in them. You can see plenty of examples by going to labelerrors.com. The site hosts many examples of bad labels in very popular benchmark datasets. These datasets include MNIST, Amazon Reviews, IMDB Reviews, QuickDraw and CIFAR.
The website is part of a research paper titled "Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks". You can read the paper on arxiv. One of the main points of the main paper is that there are just too many bad labels in many of these datasets. To the extent that any algorithms that claim state of the art performance on these datasets need to be re-run.
It's a big problem.
Because it's such a big problem we wanted to spend a few videos on this topic. It'd be a shame if our machine learning models are merely optimal because they overfit on the bad labels. That's why we're going to explore heuristics to find bad labels in our training data so that we may try to improve the quality of our training data.
This will also give us the opportunity to explore cleanlab, which is made by the creators of the label errors website to help spot bad labels.