There are many ways to get data from pandas to scikit-learn but when you're hacking in a notebook you may prefer to have something that is expressive. Like a domain specific grammar. The tool patsy offers exactly this by mocking features from the R language.


Here's the example of the custom function being used.

import patsy as ps
import numpy as np

def date_to_num(date_col):
    return (date_col - date_col.min()).dt.days

y, X = ps.dmatrices("n_born ~ date_to_num(date) + np.log(yday)", df_clean)

Beware the dangers of stateful transformations though. See the documentation for the long story.

