... pandas pipe: start


The code below demonstrates the importance of starting a pipeline with a copy of the original dataframe.

import pandas as pd 

df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')

def start_pipeline(dataf):
    return dataf.copy() 

def set_types(dataf):
    dataf['date'] = pd.to_datetime(dataf['date'])
    return dataf

df.pipe(start_pipeline).pipe(set_types).dtypes, df.dtypes

The main important thing to get right here is to make sure that we do not change the original dataframe as a side-effect.

Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider signing up for the newsletter.