scikit meta: inflated regression

# Use ZeroInflatedRegressor to deal with zeros in the regression label.

## Simulate a zero-inflated dataset.

We'll first need to generate a dataset.

``````import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.model_selection import cross_val_score
from sklego.meta import ZeroInflatedRegressor

# Note the final line of code in this block. We're setting y=0 for all weekend dates
# while we simulate standard regression data for all the other dates.
df = (pd.DataFrame({'dt': pd.date_range("2018-01-01", "2021-01-01")})
.assign(x=lambda d: np.random.normal(0, 1, d.shape[0]))
.assign(weekend = lambda d: (d['dt'].dt.weekday >= 5).astype(np.int16))
.assign(y=lambda d: np.where(d['weekend'], 0, 1.5 + 0.87 * d['x'] + np.random.normal(0, 0.2, d.shape[0]))))
``````

Next we convert this dataframe to a `X` and `y` array.

``````X, y = df[['x', 'weekend']].values, df['y'].values
``````

## Benchmarking the `ZeroInflatedRegressor`

Finally, we run a small benchmark.

``````zir = ZeroInflatedRegressor(
classifier=LogisticRegression(),
regressor=Ridge()
)

lr = Ridge(random_state=0)

print('ZIR r²:', cross_val_score(zir, X, y).mean()) # ZIR r²: 0.9715677148308327
print(' LR r²:', cross_val_score(lr, X, y).mean())  #  LR r²: 0.8154520977784985
``````

You can read more about the possible settings for this tool on the getting started docs and the api docs. A shoutout goes out to Robert Kübler for implementing this feature.