logo

... pandas pipe.


Pandas code can get quite nasty inside of your jupyter notebook. It's not just the syntax, it's the infinite amount of scrolling too. In this series of videos we're going to explore a way to clean this up. This series of videos is inspired by the modern pandas blogposts originally written by Tom Augspurger.


Episode Notes

You can add this step to the pipeline to get inflation numbers.

@log_step
def add_inflation_features(dataf):
    return (dataf
            .assign(local_inflation=lambda d: d.groupby('name')['local_price'].diff()/d['local_price'])
            .assign(dollar_inflation=lambda d: d.groupby('name')['dollar_price'].diff()/d['dollar_price']))

clean_df = (df
  .pipe(start_pipeline)
  .pipe(set_dtypes)
  .pipe(remove_outliers, min_row_country=20)
  .pipe(add_inflation_features))

Remember that it is relatively easy to make a new function, as long as you make a temporary variable to save the current dataframe into.


Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.