Scikit-Learn recently implemented a new diagram that can render your pipelines. To activate it, you'll need to run:
from sklearn import set_config
set_config(display="diagram")
To repeat what's mentioned on the official docs, here's an elaborate example;
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LogisticRegression
from sklearn import set_config
steps = [
("standard_scaler", StandardScaler()),
("polynomial", PolynomialFeatures(degree=3)),
("classifier", LogisticRegression(C=2.0)),
]
pipe = Pipeline(steps)
When you set the config and evaluate the pipe-variable ...
set_config(display="diagram")
pipe
... you'll get something that looks like this.
Pipeline(steps=[('standard_scaler', StandardScaler()),
('polynomial', PolynomialFeatures(degree=3)),
('classifier', LogisticRegression(C=2.0))])Please rerun this cell to show the HTML repr or trust the notebook.Pipeline(steps=[('standard_scaler', StandardScaler()),
('polynomial', PolynomialFeatures(degree=3)),
('classifier', LogisticRegression(C=2.0))])StandardScaler()
PolynomialFeatures(degree=3)
LogisticRegression(C=2.0)
Note, you can click around there!
Benefits
The diagrams can display settings, nested pipelines and custom components too! Suppose we take a pipeline from scikit-lego docs:
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklego.preprocessing import ColumnSelector
feature_pipeline = Pipeline([
("datagrab", FeatureUnion([
("discrete", Pipeline([
("grab", ColumnSelector("diet")),
("encode", OneHotEncoder(categories="auto", sparse=False))
])),
("continuous", Pipeline([
("grab", ColumnSelector("time")),
("standardize", StandardScaler())
]))
]))
])
pipe = Pipeline([
("transform", feature_pipeline),
("model", LinearRegression())
])
Here's what the pipeline looks like:
Pipeline(steps=[('transform',
Pipeline(steps=[('datagrab',
FeatureUnion(transformer_list=[('discrete',
Pipeline(steps=[('grab',
ColumnSelector(columns='diet')),
('custom',
<__main__.Custom object at 0x7f8c58c70a60>),
('encode',
OneHotEncoder(sparse=False))])),
('continuous',
Pipeline(steps=[('grab',
ColumnSelector(columns='time')),
('standardize',
StandardScaler())]))]))])),
('model', LinearRegression())])Please rerun this cell to show the HTML repr or trust the notebook.Pipeline(steps=[('transform',
Pipeline(steps=[('datagrab',
FeatureUnion(transformer_list=[('discrete',
Pipeline(steps=[('grab',
ColumnSelector(columns='diet')),
('custom',
<__main__.Custom object at 0x7f8c58c70a60>),
('encode',
OneHotEncoder(sparse=False))])),
('continuous',
Pipeline(steps=[('grab',
ColumnSelector(columns='time')),
('standardize',
StandardScaler())]))]))])),
('model', LinearRegression())])Pipeline(steps=[('datagrab',
FeatureUnion(transformer_list=[('discrete',
Pipeline(steps=[('grab',
ColumnSelector(columns='diet')),
('custom',
<__main__.Custom object at 0x7f8c58c70a60>),
('encode',
OneHotEncoder(sparse=False))])),
('continuous',
Pipeline(steps=[('grab',
ColumnSelector(columns='time')),
('standardize',
StandardScaler())]))]))])FeatureUnion(transformer_list=[('discrete',
Pipeline(steps=[('grab',
ColumnSelector(columns='diet')),
('custom',
<__main__.Custom object at 0x7f8c58c70a60>),
('encode',
OneHotEncoder(sparse=False))])),
('continuous',
Pipeline(steps=[('grab',
ColumnSelector(columns='time')),
('standardize',
StandardScaler())]))])ColumnSelector(columns='diet')
<__main__.Custom object at 0x7f8c58c70a60>
OneHotEncoder(sparse=False)
ColumnSelector(columns='time')
StandardScaler()
LinearRegression()
If you want to get the best results with these custom components, we recommend making sure that you build on top of the BaseEstimator that comes from scikit-learn when you construct your custom classes. That way, the parameters will render nicely. An example of an implementation can be found on the scikit-lego GitHub repo.
Back to main.