The way data is preprocessed can have a huge effect in your scikit-learn pipelines. This series of videos will highlight common techniques for preprocessing data for modelling.
You can download the dataset here.
Here's what the original dataset looks like.
df = pd.read_csv("drawndata2.csv") X = df[['x', 'y']].values y = df['z'] == 'a' plt.scatter(X[:, 0], X[:, 1], c=y);
PolynomialFeatures at work.
from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import Pipeline pipe = Pipeline([ ("scale", PolynomialFeatures()), ("model", LogisticRegression()) ]) pred = pipe.fit(X, y).predict(X) plt.scatter(X[:, 0], X[:, 1], c=pred);
Feedback? See an issue? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.