logo

... scikit prep.


The way data is preprocessed can have a huge effect in your scikit-learn pipelines. This series of videos will highlight common techniques for preprocessing data for modelling.


Episode Notes

You can download the dataset here.

Here's what the original dataset looks like.

df = pd.read_csv("drawndata2.csv")
X = df[['x', 'y']].values
y = df['z'] == 'a'
plt.scatter(X[:, 0], X[:, 1], c=y);

Here's the PolynomialFeatures at work.

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

pipe = Pipeline([
    ("scale", PolynomialFeatures()),
    ("model", LogisticRegression())
])

pred = pipe.fit(X, y).predict(X)
plt.scatter(X[:, 0], X[:, 1], c=pred);

Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.