... scikit metrics.

If you're going to use optimise a model in scikit-learn then it better optimise towards the right thing. This means that you have to understand metrics in scikit-learn. This series of videos will give an overview in how they work, how you can create your own and how the gridsearch interacts with it.


Here's the code for the custom metric.

def min_recall_precision(y_true, y_pred):
    recall = recall_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    return min(recall, precision)

Here's the code for the train set performance chart.

from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    scoring={'precision': make_scorer(precision_score), 
            'recall': make_scorer(recall_score),
            'min_both': make_scorer(min_recall_precision)},
    param_grid={'class_weight': [{0: 1, 1: v} for v in range(1, 4)]},
grid.fit(X, y);

To plot it all you need this code.

plt.figure(figsize=(12, 4))
df_results = pd.DataFrame(grid.cv_results_)
for score in ['mean_train_recall', 'mean_train_precision', 'mean_test_min_both']:
    plt.scatter(x=[_[1] for _ in df_results['param_class_weight']], 
                y=df_results[score.replace('test', 'train')], 


You might think it is unintuitive. After all, imagine that we have a single model where the precision is consitently higher than recall then the minimum of the two should be exactly equal to the recall. Then for eveyr cross validation we might see something like this;

But maybe, this won't happen as consistently. Imagine now that at some point that the precision was lower than the recall somewhere.

Now we see that the minimum of the two is lower than either. The odds of this happening increase with the number of cross validations.

Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.