pandas pipe:
logs
Pandas code can get quite nasty inside of your jupyter notebook. It's not just the syntax, it's the infinite amount of scrolling too. In this series of videos we're going to explore a way to clean this up. This series of videos is inspired by the modern pandas blogposts originally written by Tom Augspurger.
Notes
The code is advanced (especially if you're new to decorators) but in python you can write a function that can decorate another function. If you appreciate a refresher on how decorators work, feel free to check out our course on the topic here.
Here's an example meant for pandas pipelines.
from functools import wraps
import datetime as dt
def log_step(func):
@wraps(func)
def wrapper(*args, **kwargs):
tic = dt.datetime.now()
result = func(*args, **kwargs)
time_taken = str(dt.datetime.now() - tic)
print(f"just ran step {func.__name__} shape={result.shape} took {time_taken}s")
return result
return wrapper
You can use this code to decorate your pipeline steps.
import pandas as pd
df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')
@log_step
def start_pipeline(dataf):
return dataf.copy()
@log_step
def set_dtypes(dataf):
return (dataf
.assign(date=lambda d: pd.to_datetime(d['date']))
.sort_values(['currency_code', 'date']))
@log_step
def remove_outliers(dataf, min_row_country=32):
countries = (dataf
.groupby('currency_code')
.agg(n=('name', 'count'))
.loc[lambda d: d['n'] >= min_row_country]
.index)
return (dataf
.loc[lambda d: d['currency_code'].isin(countries)])
When you now run this code, you'll see output printed as a side-effect.
(df
.pipe(start_pipeline)
.pipe(set_dtypes)
.pipe(remove_outliers, min_row_country=20))
Feedback? See an issue? Something unclear? Feel free to mention it here.
If you want to be kept up to date, consider signing up for the newsletter.