model-mining logo model-mining: coordinates

1 2 3 4
Notes

Let's use a parallel coordinates chart to inspire us for a better model.

from hulearn.experimental import parallel_coordinates
parallel_coordinates(df, label="survived")

The color of the line indicates the class that we'd like to predict. That makes it easy to eye-ball when we sub-selections might become a rule for a model.

Here's the model that we found while playing around.

def make_prediction(dataf, age=15):
    women_rule = (dataf['pclass'] < 3.0) & (dataf['sex'] == "female")
    children_rule = (dataf['pclass'] < 3.0) & (dataf['age'] <= age)
    return women_rule | children_rule

mod = FunctionClassifier(make_prediction)

We've kept "age" as a hyperparameter, which could be optimised.

grid_rule = GridSearchCV(mod,
                         cv=10,
                         param_grid={'age': range(5, 50)},
                         scoring={'accuracy': make_scorer(accuracy_score),
                                   'precision': make_scorer(precision_score),
                                   'recall': make_scorer(recall_score)},
                         refit='accuracy')

df = load_titanic(as_frame=True)
X, y = df.drop(columns=['survived']), df['survived']
grid_rule.fit(X, y);

When you then look at the results from the grid-search you can confirm a 80% accuracy and a 95% precision. That's better than before!

score_df = (pd.DataFrame(grid_rule.cv_results_)
  .set_index('param_age')
  [['mean_test_accuracy', 'mean_test_precision', 'mean_test_recall']])

score_df.head(15)