<p>There are many ways to get data from pandas to scikit-learn but when you're hacking in a notebook you may prefer to have something that is expressive. Like a domain specific grammar. The tool <a href="https://patsy.readthedocs.io/en/latest/overview.html">patsy</a> offers exactly this by mocking features from the R language.</p>

1 - Introduction
2 - Patsy Code
3 - Model
4 - Categories
5 - Functions
6 - Operations
7 - Splines
8 - Iteration
9 - Lego

For this video you'll need to install the following dependencies;

python -m pip install jupyterlab pandas scikit-learn patsy matplotlib

You'll also need the dataset, it can be fetched here or downloaded via;

wget https://calmcode.io/datasets/birthdays.csv

The python code in the beginning of this notebook is;

import patsy as ps
import numpy as np
import pandas as pd
import matplotlib.pylab as plt

from sklearn.linear_model import LinearRegression

df = pd.read_csv("birthdays.csv")

def clean_data(dataf):
    return (dataf
            .drop(columns=['Unnamed: 0'])
            .assign(date = lambda d: pd.to_datetime(d['date']))
            .groupby(['date', 'wday', 'month'])
            .agg(n_born=('births', 'sum'))
            .assign(yday = lambda d: d['date'].dt.dayofyear))

df_clean = df.pipe(clean_data)