scikit meta: decay

How to discount regression over time in scikit-learn.

Simulating a Time-Series

We'll first need to generate a dataset before we can benchmark.

``````from sklearn.dummy import DummyRegressor
from sklego.meta import GroupedPredictor, DecayEstimator
from sklego.datasets import make_simpleseries
import matplotlib.pylab as plt

yt = make_simpleseries(seed=1)
dates = pd.date_range("2000-01-01", periods=len(yt))
df = (pd.DataFrame({"yt": yt,
"date": dates})
.assign(m=lambda d: d.date.dt.month)
.reset_index())

plt.figure(figsize=(12, 3))
plt.plot(dates, make_simpleseries(seed=1));
``````

First Benchmark: Grouped Prediction

To make a simple model that calculates the mean per month you can simply run:

``````mod1 = (GroupedPredictor(DummyRegressor(), groups=["m"])
.fit(df[['m']], df['yt']))

plt.figure(figsize=(12, 3))
plt.plot(df['yt'], alpha=0.5);
plt.plot(mod1.predict(df[['m']]), label="grouped")
plt.legend();
``````

Second Benchmark: Grouped Prediction with DecayEstimator

If you want to see the effect of the additional decay, run:

``````mod1 = (GroupedPredictor(DummyRegressor(), groups=["m"])
.fit(df[['m']], df['yt']))

mod2 = (GroupedPredictor(DecayEstimator(DummyRegressor(), decay=0.9), groups=["m"])
.fit(df[['index', 'm']], df['yt']))

plt.figure(figsize=(12, 3))
plt.plot(df['yt'], alpha=0.5);
plt.plot(mod1.predict(df[['m']]), label="grouped")
plt.plot(mod2.predict(df[['index', 'm']]), label="decayed")
plt.legend();
``````