Rolling Date-Based Averages in Pandas
In the previous video we had rolling statistics based on the rows. It's typically safer to use dates instead. So let's refactor our code.
Before
import pandas as pd
df = pd.read_csv("https://calmcode.io/datasets/birthdays.csv")
subset_df = (df[['state', 'date', 'births']]
.assign(date=lambda d: pd.to_datetime(d['date'], format="%Y-%m-%d"))
.loc[lambda d: d['state'] == 'CA']
.tail(365 * 2))
subset_df.assign(rolling_births=lambda d: d.rolling(10, min_periods=1).mean())
After
plot_df = (subset_df
.set_index('date')
.assign(rolling_births=lambda d: d['births'].rolling('30D', min_periods=1).mean())
.reset_index())
Take note that we're using .set_date('date')
. Without it our .rolling('30D')
-call
cannot properly detect on what date to operate.
Chart
By running our code this way you'll notice that we have a very convenient method of generating our plots. The code below can generate pretty altair charts.
import altair as alt
p1 = (alt.Chart(plot_df)
.mark_line()
.encode(x='date', y='births')
.properties(width=600, height=250)
.interactive())
p2 = (alt.Chart(plot_df)
.mark_line(color='red')
.encode(x='date', y='rolling_births')
.properties(width=600, height=250)
.interactive())
p1 + p2
This is what the chart looks like:
Note that you can click/drag/pan/zoom if you like. To learn more about these altair charts, check our course.