Scikit-Learn recently implemented a new diagram that can render your pipelines. To activate it, you'll need to run:
from sklearn import set_config
set_config(display="diagram")
To repeat what's mentioned on the official docs, here's an elaborate example;
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LogisticRegression
from sklearn import set_config
steps = [
("standard_scaler", StandardScaler()),
("polynomial", PolynomialFeatures(degree=3)),
("classifier", LogisticRegression(C=2.0)),
]
pipe = Pipeline(steps)
When you set the config and evaluate the pipe
-variable ...
set_config(display="diagram")
pipe
... you'll get something that looks like this.
Pipeline(steps=[('standard_scaler', StandardScaler()), ('polynomial', PolynomialFeatures(degree=3)), ('classifier', LogisticRegression(C=2.0))])Please rerun this cell to show the HTML repr or trust the notebook.
Pipeline(steps=[('standard_scaler', StandardScaler()), ('polynomial', PolynomialFeatures(degree=3)), ('classifier', LogisticRegression(C=2.0))])
StandardScaler()
PolynomialFeatures(degree=3)
LogisticRegression(C=2.0)
Note, you can click around there!
Benefits
The diagrams can display settings, nested pipelines and custom components too! Suppose we take a pipeline from scikit-lego docs:
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklego.preprocessing import ColumnSelector
feature_pipeline = Pipeline([
("datagrab", FeatureUnion([
("discrete", Pipeline([
("grab", ColumnSelector("diet")),
("encode", OneHotEncoder(categories="auto", sparse=False))
])),
("continuous", Pipeline([
("grab", ColumnSelector("time")),
("standardize", StandardScaler())
]))
]))
])
pipe = Pipeline([
("transform", feature_pipeline),
("model", LinearRegression())
])
Here's what the pipeline looks like:
Pipeline(steps=[('transform', Pipeline(steps=[('datagrab', FeatureUnion(transformer_list=[('discrete', Pipeline(steps=[('grab', ColumnSelector(columns='diet')), ('custom', <__main__.Custom object at 0x7f8c58c70a60>), ('encode', OneHotEncoder(sparse=False))])), ('continuous', Pipeline(steps=[('grab', ColumnSelector(columns='time')), ('standardize', StandardScaler())]))]))])), ('model', LinearRegression())])Please rerun this cell to show the HTML repr or trust the notebook.
Pipeline(steps=[('transform', Pipeline(steps=[('datagrab', FeatureUnion(transformer_list=[('discrete', Pipeline(steps=[('grab', ColumnSelector(columns='diet')), ('custom', <__main__.Custom object at 0x7f8c58c70a60>), ('encode', OneHotEncoder(sparse=False))])), ('continuous', Pipeline(steps=[('grab', ColumnSelector(columns='time')), ('standardize', StandardScaler())]))]))])), ('model', LinearRegression())])
Pipeline(steps=[('datagrab', FeatureUnion(transformer_list=[('discrete', Pipeline(steps=[('grab', ColumnSelector(columns='diet')), ('custom', <__main__.Custom object at 0x7f8c58c70a60>), ('encode', OneHotEncoder(sparse=False))])), ('continuous', Pipeline(steps=[('grab', ColumnSelector(columns='time')), ('standardize', StandardScaler())]))]))])
FeatureUnion(transformer_list=[('discrete', Pipeline(steps=[('grab', ColumnSelector(columns='diet')), ('custom', <__main__.Custom object at 0x7f8c58c70a60>), ('encode', OneHotEncoder(sparse=False))])), ('continuous', Pipeline(steps=[('grab', ColumnSelector(columns='time')), ('standardize', StandardScaler())]))])
ColumnSelector(columns='diet')
<__main__.Custom object at 0x7f8c58c70a60>
OneHotEncoder(sparse=False)
ColumnSelector(columns='time')
StandardScaler()
LinearRegression()
If you want to get the best results with these custom components, we recommend making sure that you build on top of the BaseEstimator
that comes from scikit-learn when you construct your custom classes. That way, the parameters will render nicely. An example of an implementation can be found on the scikit-lego GitHub repo.
Back to main.