Calmcode - pandas pipe: start

Starting a Pandas Pipeline Safely

1 2 3 4 5 6 7 8 9

The code below demonstrates the importance of starting a pipeline with a copy of the original dataframe.

import pandas as pd

df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')

def start_pipeline(dataf):
    return dataf.copy()

def set_types(dataf):
    dataf['date'] = pd.to_datetime(dataf['date'])
    return dataf

df.pipe(start_pipeline).pipe(set_types).dtypes, df.dtypes

The main important thing to get right here is to make sure that we do not change the original dataframe as a side-effect.