The code below demonstrates the importance of starting a pipeline with a copy of the original dataframe.
import pandas as pd
df = pd.read_csv('https://calmcode.io/datasets/bigmac.csv')
def start_pipeline(dataf):
return dataf.copy()
def set_types(dataf):
dataf['date'] = pd.to_datetime(dataf['date'])
return dataf
df.pipe(start_pipeline).pipe(set_types).dtypes, df.dtypes
The main important thing to get right here is to make sure that we do not change the original dataframe as a side-effect.