scikit metrics logo scikit metrics: custom

1 2 3 4 5 6 7 8 9 10 11 12
Notes

Here's the code for the custom metric.

def min_recall_precision(y_true, y_pred):
    recall = recall_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    return min(recall, precision)

Here's the code for the train set performance chart.

from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    estimator=LogisticRegression(max_iter=1000),
    scoring={'precision': make_scorer(precision_score),
            'recall': make_scorer(recall_score),
            'min_both': make_scorer(min_recall_precision)},
    param_grid={'class_weight': [{0: 1, 1: v} for v in range(1, 4)]},
    refit='precision',
    return_train_score=True,
    cv=10,
    n_jobs=-1
)
grid.fit(X, y);

To plot it all you need this code.

plt.figure(figsize=(12, 4))
df_results = pd.DataFrame(grid.cv_results_)
for score in ['mean_train_recall', 'mean_train_precision', 'mean_test_min_both']:
    plt.scatter(x=[_[1] for _ in df_results['param_class_weight']],
                y=df_results[score.replace('test', 'train')],
                label=score)
plt.legend();
Solution

You might think it is unintuitive. After all, imagine that we have a single model where the precision is consitently higher than recall then the minimum of the two should be exactly equal to the recall. Then for eveyr cross validation we might see something like this;

But maybe, this won't happen as consistently. Imagine now that at some point that the precision was lower than the recall somewhere.

Now we see that the minimum of the two is lower than either. The odds of this happening increase with the number of cross validations.