There are many ways to get data from pandas to scikit-learn but when you're hacking in a notebook you may prefer to have something that is expressive. Like a domain specific grammar. The tool patsy offers exactly this by mocking features from the R language.
To use scikit-lego you'll need to install it first;
pip install scikit-lego
You can now use it in the pipeline.
from sklearn.linear_model import LinearRegression from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklego.preprocessing import PatsyTransformer import matplotlib.pylab as plt X = (df_clean .head(2000) .loc[lambda d: d['n_born'] > 2000] .assign(num_date = lambda d: date_to_num(d['date']))) y = X['n_born'] pipe = Pipeline([ ("patsy", PatsyTransformer("(cc(yday, df=12) + wday + num_date)**2")), ("scale", StandardScaler()), ("model", LinearRegression()) ]) np.mean(np.abs(pipe.fit(X, y).predict(X) - y))
The scikit-lego documentation for this can be found here.
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider getting the newsletter.