Let's look at the code from the previous video.
import polars as pl
df = pl.read_csv("wowah_data.csv", parse_dates=False)
df.columns = [c.replace(" ", "") for c in df.columns]
df = df.lazy()
# Note that you need to call `.collect()` if you want to see results.
df.with_columns([
pl.col("guild") != -1,
pl.col("timestamp").str.strptime(pl.Datetime, fmt="%m/%d/%y %H:%M:%S"),
]).collect()
Pipe
The code is functional, but it may be good to give it some more structure. Let's rewrite it such that it represents a pipeline.
def set_types(dataf):
return (dataf.with_columns([
pl.col("guild") != -1,
pl.col("timestamp").str.strptime(pl.Datetime, fmt="%m/%d/%y %H:%M:%S"),
]))
# We can re-use this function in a pipeline.
df.pipe(set_types).collect()
Using the .pipe()
method we'll be able to separate concerns and keep the code more maintainable in the long run.
Lazy DataFrames
Note that when we use the .pipe()
method we're still dealing with a lazy dataframe. We're not running anything until we run the .collect()
method. You can confirm by checking;
type(df.pipe(set_types))