Calmcode - pandas datetime: rolling dates

Rolling average based on dates in Pandas.

1 2 3 4 5 6

Rolling Date-Based Averages in Pandas

In the previous video we had rolling statistics based on the rows. It's typically safer to use dates instead. So let's refactor our code.

Before

import pandas as pd

df = pd.read_csv("https://calmcode.io/datasets/birthdays.csv")

subset_df = (df[['state', 'date', 'births']]
.assign(date=lambda d: pd.to_datetime(d['date'], format="%Y-%m-%d"))
.loc[lambda d: d['state'] == 'CA']
.tail(365 * 2))

subset_df.assign(rolling_births=lambda d: d.rolling(10, min_periods=1).mean())

After

plot_df = (subset_df
    .set_index('date')
    .assign(rolling_births=lambda d: d['births'].rolling('30D', min_periods=1).mean())
    .reset_index())

Take note that we're using .set_date('date'). Without it our .rolling('30D')-call cannot properly detect on what date to operate.

Chart

By running our code this way you'll notice that we have a very convenient method of generating our plots. The code below can generate pretty altair charts.

import altair as alt

p1 = (alt.Chart(plot_df)
.mark_line()
.encode(x='date', y='births')
.properties(width=600, height=250)
.interactive())

p2 = (alt.Chart(plot_df)
.mark_line(color='red')
.encode(x='date', y='rolling_births')
.properties(width=600, height=250)
.interactive())

p1 + p2

This is what the chart looks like:

Note that you can click/drag/pan/zoom if you like. To learn more about these altair charts, check our course.