The way data is preprocessed can have a huge effect in your scikit-learn pipelines. This series of videos will highlight common techniques for preprocessing data for modelling.
You can download the dataset here.
To plot what the raw data looks like you can run this code;
import numpy as np import pandas as pd import matplotlib.pylab as plt df = pd.read_csv("drawndata1.csv") X = df[['x', 'y']].values y = df['z'] == "a" plt.scatter(X[:, 0], X[:, 1], c=y);
To see the effect from the standard scaler you need to run this;
from sklearn.preprocessing import StandardScaler X_new = StandardScaler().fit_transform(X) plt.scatter(X_new[:, 0], X_new[:, 1], c=y);
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.