Calmcode - human learn: classifier

Classifier

1 2 3 4 5 6 7 8

We can write a function to make a classification but that doesn't mean that you can use it in a scikit-learn pipeline. If we were to use a FunctionClassifier though, we could!

Let's expand our work from the previous video by making a classifier.

The fare_based function defined below can be used to make predictions.

from hulearn.datasets import load_titanic

df = load_titanic(as_frame=True)
X, y = df.drop(columns=['survived']), df['survived']

def fare_based(dataf, threshold=10):
    return np.array(dataf['fare'] > threshold).astype(int)

from hulearn.classification import FunctionClassifier
# This next line of code changes the function into a proper scikit-learn compatible model.
mod = FunctionClassifier(fare_based, threshold=10)

We can use this mod in a GridSearchCV too.

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import precision_score, recall_score, accuracy_score, make_scorer

grid = GridSearchCV(mod,
                    cv=2,
                    param_grid={'threshold': np.linspace(0, 100, 30)},
                    scoring={'accuracy': make_scorer(accuracy_score),
                            'precision': make_scorer(precision_score),
                            'recall': make_scorer(recall_score)},
                    refit='accuracy'
                )
grid.fit(X, y)

You can use this trained grid object to make a chart that shows the effect of threshold.

score_df = (pd.DataFrame(grid.cv_results_)
  .set_index('param_threshold')
  [['mean_test_accuracy', 'mean_test_precision', 'mean_test_recall']])

score_df.plot(figsize=(12, 5), title="scores vs. fare-threshold");