Let's now repeat the technique on a harder dataset. We'll also use the hiplot parallel coordinates implementation this time around. If you're unfamiliar with hiplot, check out our hiplot course first.
You can retreive the creditcard data via the fetch_openml
API.
from sklearn.datasets import fetch_openml
df_credit = fetch_openml(
data_id=1597,
as_frame=True
)
df_credit = df_credit['frame'].rename(columns={"Class": "group"})
df_credit['group'] = df_credit['group'] == '1'
df_credit.head()
You can confirm that this dataset suffers from a class imbalance.
df_credit.group.value_counts()
So let's now try building a model with a train/test split.
from sklearn.model_selection import train_test_split
credit_train, credit_test = train_test_split(df_credit, test_size=0.5, shuffle=True)
Next, we pass the train data to hiplot. We take a sample of the non-fraud examples to ensure that hiplot remains responsive.
import json
import hiplot as hip
samples = [credit_train.loc[lambda d: d['group'] == True], credit_train.sample(5000)]
json_data = pd.concat(samples).to_json(orient='records')
hip.Experiment.from_iterable(json.loads(json_data)).display()
Again, we can play around to find some rules. Here's the translated model that we found in the video.
from hulearn.experimental import CaseWhenRuler
def make_prediction(dataf):
ruler = CaseWhenRuler(default=0)
(ruler
.add_rule(lambda d: (d['V11'] > 4), 1)
.add_rule(lambda d: (d['V17'] < -3), 1)
.add_rule(lambda d: (d['V14'] < -8), 1))
return ruler.predict(dataf)
clf = FunctionClassifier(make_prediction)
from sklearn.metrics import classification_report
print(classification_report(credit_test['group'], clf.fit(credit_test, credit_test['group']).predict(credit_test)))
This is the report that we got in the end.
precision recall f1-score support
False 1.00 1.00 1.00 142164
True 0.69 0.72 0.70 240
accuracy 1.00 142404
macro avg 0.85 0.86 0.85 142404
weighted avg 1.00 1.00 1.00 142404
If you'd like, you can compare these results with the one from the keras blog.
Notes on Benchmarking
The main point that we hope to demonstrate is that this technique has merit but we should point out to take this result with a grain of salt. The author of the Keras blog likely wasn't trying to make a state of the art model and was likely more worried about clearly explaining a technique (which, the blogpost does quite well). Our technique also has an element of "luck" since this dataset lends itself particularily well to the visualisation technique.
It's also been correctly pointed out that this course calculates a slighly different number since the Keras blogpost uses a slightly different validation set than we do. We shuffle and take 50% of the data for validation while the Keras blog doesn't shuffle and takes 20% for validation. A community member took the effort to explore this and noticed that the numbers are slightly different when you account for this.
precision recall f1-score support
False 1.00 1.00 1.00 56886
True 0.75 0.63 0.68 75
accuracy 1.00 56961
macro avg 0.87 0.81 0.84 56961
weighted avg 1.00 1.00 1.00 56961
The numbers still suggest there's plenty of merit to mining a model here, but it's a more fair statistic. For a discussion on the matter, see this github issue.