This is the code before.
import pandas as pd
df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')
df2 = (df
.assign(date=lambda d: pd.to_datetime(d['date']))
.sort_values(['currency_code', 'date'])
.groupby('currency_code')
.agg(n=('date', 'count')))
df.loc[lambda d: d['currency_code'].isin(df2[df2['n'] >= 32].index)]
This is the code after.
import pandas as pd
df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')
def set_dtypes(dataf):
return (dataf
.assign(date=lambda d: pd.to_datetime(d['date']))
.sort_values(['currency_code', 'date']))
def remove_outliers(dataf):
min_row_country=32
countries = (dataf
.groupby('currency_code')
.agg(n=('name', 'count'))
.loc[lambda d: d['n'] >= min_row_country]
.index)
return (dataf
.loc[lambda d: d['currency_code'].isin(countries)])
df.pipe(set_dtypes).pipe(remove_outliers)
Notice how we now have a pipeline with steps. Each function is like a lego brick, which introduces a lot of structure to our code!