logo

... patsy.


There are many ways to get data from pandas to scikit-learn but when you're hacking in a notebook you may prefer to have something that is expressive. Like a domain specific grammar. The tool patsy offers exactly this by mocking features from the R language.


Notes

To generate the repeating features, run;

x = np.linspace(0.0, 1.0, 100)
x_mat = ps.dmatrix("cc(x, df=5) - 1", pd.DataFrame({"x": x}))
plt.plot(x, x_mat);

The final example with splines is shown below;

df_ml = df_clean.head(1200).loc[lambda d: d['n_born'] > 2000]
y, X = ps.dmatrices("n_born ~ cc(yday, df=12)", df_ml)
mod = LinearRegression().fit(X, y)

plt.figure(figsize=(12, 3))
plt.scatter(df_ml['date'], y)
plt.plot(df_ml['date'], mod.predict(X), color='orange');

Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.