Calmcode - pandas datetime: rolling dates

Rolling averages based on dates in Pandas.

1 2 3 4 5 6

Rolling Date-Based Averages in Pandas

In the previous video we had rolling statistics based on the rows. It's typically safer to use dates instead. So let's refactor our code.


import pandas as pd

df = pd.read_csv("")

subset_df = (df[['state', 'date', 'births']]
.assign(date=lambda d: pd.to_datetime(d['date'], format="%Y-%m-%d"))
.loc[lambda d: d['state'] == 'CA']
.tail(365 * 2))

subset_df.assign(rolling_births=lambda d: d.rolling(10, min_periods=1).mean())


plot_df = (subset_df
    .assign(rolling_births=lambda d: d['births'].rolling('30D', min_periods=1).mean())

Take note that we're using .set_date('date'). Without it our .rolling('30D')-call cannot properly detect on what date to operate.


By running our code this way you'll notice that we have a very convenient method of generating our plots. The code below can generate pretty altair charts.

import altair as alt

p1 = (alt.Chart(plot_df)
.encode(x='date', y='births')
.properties(width=600, height=250)

p2 = (alt.Chart(plot_df)
.encode(x='date', y='rolling_births')
.properties(width=600, height=250)

p1 + p2

This is what the chart looks like:

Note that you can click/drag/pan/zoom if you like. To learn more about these altair charts, check our course.