scikit metrics:
imbalance
If you're going to use optimise a model in scikit-learn then it better optimise towards the right thing. This means that you have to understand metrics in scikit-learn. This series of videos will give an overview in how they work, how you can create your own and how the gridsearch interacts with it.
Notes
If you want to follow along then you can fetch the dataset here.
This is the code used to read in the data.
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
df = pd.read_csv("~/Downloads/creditcard.csv")[:80_000]
df.head(3)
Here we extract features for scikit-learn.
X = df.drop(columns=['Time', 'Amount', 'Class']).values
y = df['Class'].values
f"Shapes of X={X.shape} y={y.shape}, #Fraud Cases={y.sum()}"
And here we train our first model.
from sklearn.linear_model import LogisticRegression
mod = LogisticRegression(class_weight={0: 1, 1: 2}, max_iter=1000)
mod.fit(X, y).predict(X).sum()
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.