Simulating a Time-Series
We'll first need to generate a dataset before we can benchmark.
from sklearn.dummy import DummyRegressor
from sklego.meta import GroupedPredictor, DecayEstimator
from sklego.datasets import make_simpleseries
import matplotlib.pylab as plt
yt = make_simpleseries(seed=1)
dates = pd.date_range("2000-01-01", periods=len(yt))
df = (pd.DataFrame({"yt": yt,
"date": dates})
.assign(m=lambda d: d.date.dt.month)
.reset_index())
plt.figure(figsize=(12, 3))
plt.plot(dates, make_simpleseries(seed=1));
First Benchmark: Grouped Prediction
To make a simple model that calculates the mean per month you can simply run:
mod1 = (GroupedPredictor(DummyRegressor(), groups=["m"])
.fit(df[['m']], df['yt']))
plt.figure(figsize=(12, 3))
plt.plot(df['yt'], alpha=0.5);
plt.plot(mod1.predict(df[['m']]), label="grouped")
plt.legend();
Second Benchmark: Grouped Prediction with DecayEstimator
If you want to see the effect of the additional decay, run:
mod1 = (GroupedPredictor(DummyRegressor(), groups=["m"])
.fit(df[['m']], df['yt']))
mod2 = (GroupedPredictor(DecayEstimator(DummyRegressor(), decay=0.9), groups=["m"])
.fit(df[['index', 'm']], df['yt']))
plt.figure(figsize=(12, 3))
plt.plot(df['yt'], alpha=0.5);
plt.plot(mod1.predict(df[['m']]), label="grouped")
plt.plot(mod2.predict(df[['index', 'm']]), label="decayed")
plt.legend();