logo

... patsy.


There are many ways to get data from pandas to scikit-learn but when you're hacking in a notebook you may prefer to have something that is expressive. Like a domain specific grammar. The tool patsy offers exactly this by mocking features from the R language.


Episode Notes

Here's the example of the custom function being used.

import patsy as ps
import numpy as np

def date_to_num(date_col):
    return (date_col - date_col.min()).dt.days

y, X = ps.dmatrices("n_born ~ date_to_num(date) + np.log(yday)", df_clean)

Beware the dangers of stateful transformations though. See the documentation for the long story.


Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.