Let's look at the code from the previous video.
import polars as pl df = pl.read_csv("wowah_data.csv", parse_dates=False) df.columns = [c.replace(" ", "") for c in df.columns] df = df.lazy() # Note that you need to call `.collect()` if you want to see results. df.with_columns([ pl.col("guild") != -1, pl.col("timestamp").str.strptime(pl.Datetime, fmt="%m/%d/%y %H:%M:%S"), ]).collect()
The code is functional, but it may be good to give it some more structure. Let's rewrite it such that it represents a pipeline.
def set_types(dataf): return (dataf.with_columns([ pl.col("guild") != -1, pl.col("timestamp").str.strptime(pl.Datetime, fmt="%m/%d/%y %H:%M:%S"), ])) # We can re-use this function in a pipeline. df.pipe(set_types).collect()
.pipe() method we'll be able to separate concerns and keep the code more maintainable in the long run.
Note that when we use the
.pipe() method we're still dealing with a lazy dataframe. We're not running anything until we run the
.collect() method. You can confirm by checking;