Calmcode - scikit metrics: imbalance

Imbalance

1 2 3 4 5 6 7 8 9 10 11 12

If you want to follow along then you can fetch the dataset from kaggle.

This is the code used to read in the data.

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

df  = pd.read_csv("~/Downloads/creditcard.csv")[:80_000]
df.head(3)

Here we extract features for scikit-learn.

X = df.drop(columns=['Time', 'Amount', 'Class']).values
y = df['Class'].values
f"Shapes of X={X.shape} y={y.shape}, #Fraud Cases={y.sum()}"

And here we train our first model.

from sklearn.linear_model import LogisticRegression

mod = LogisticRegression(class_weight={0: 1, 1: 2}, max_iter=1000)
mod.fit(X, y).predict(X).sum()