logo

... scikit metrics: imbalance



Notes

If you want to follow along then you can fetch the dataset here.

This is the code used to read in the data.

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

df  = pd.read_csv("~/Downloads/creditcard.csv")[:80_000]
df.head(3)

Here we extract features for scikit-learn.

X = df.drop(columns=['Time', 'Amount', 'Class']).values
y = df['Class'].values
f"Shapes of X={X.shape} y={y.shape}, #Fraud Cases={y.sum()}"

And here we train our first model.

from sklearn.linear_model import LogisticRegression

mod = LogisticRegression(class_weight={0: 1, 1: 2}, max_iter=1000)
mod.fit(X, y).predict(X).sum()

Feedback? See an issue? Something unclear? Feel free to mention it here.



If you want to be kept up to date, consider signing up for the newsletter.