logo

... pandas pipe.


Pandas code can get quite nasty inside of your jupyter notebook. It's not just the syntax, it's the infinite amount of scrolling too. In this series of videos we're going to explore a way to clean this up. This series of videos is inspired by the modern pandas blogposts originally written by Tom Augspurger.


Episode Notes

The code below demonstrates the importance of starting a pipeline with a copy of the original dataframe.

import pandas as pd 

df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')

def start_pipeline(dataf):
    return dataf.copy() 

def set_types(dataf):
    dataf['date'] = pd.to_datetime(dataf['date'])
    return dataf

df.pipe(start_pipeline).pipe(set_types).dtypes, df.dtypes

The main important thing to get right here is to make sure that we do not change the original dataframe as a side-effect.


Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.