logo

... scikit prep.


The way data is preprocessed can have a huge effect in your scikit-learn pipelines. This series of videos will highlight common techniques for preprocessing data for modelling.


Episode Notes

You can download the dataset here.

To plot what the raw data looks like you can run this code;

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

df = pd.read_csv("drawndata1.csv")
X = df[['x', 'y']].values
y = df['z'] == "a"
plt.scatter(X[:, 0], X[:, 1], c=y);

To see the effect from the standard scaler you need to run this;

from sklearn.preprocessing import StandardScaler

X_new = StandardScaler().fit_transform(X)
plt.scatter(X_new[:, 0], X_new[:, 1], c=y);

Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.