Calmcode - scikit save: onnx-sklearn

Onnx Sklearn

1 2 3 4 5 6

The ONNX project describes itself als "an open format built to represent machine learning models". It's meant to be a universal tool that can deploy models from a wide array of frameworks. Recently they also started supporting scikit-learn pipelines via their onnx-sklearn project. In order to use it you will need to install two extra packages. One package (skl2onnx) will allow you to store the pipeline and another one (onnxruntime) will allow you to run the stored model in the onnx runtime.

You can install both packages via pip.

python -m pip install skl2onnx onnxruntime

Storing Scikit-Learn Models

The script below allows you to store the machine learning pipeline to disk.

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import StringTensorType

# Note that we need to be formal! We must define upfront
# what kind of data the model will receive as input.
# We will input an array with one column that is of type "string".
initial_type = [('text_input', StringTensorType([None, 1]))]
onx = convert_sklearn(pipe, initial_types=initial_type)

# This line will save the model to disk.
with open("clinc-logreg.onnx", "wb") as f:
    f.write(onx.SerializeToString())

Running ONNX Models

We can run this model with ONNX too.

import numpy as np
import onnxruntime as rt

# First we must start a session.
sess = rt.InferenceSession("clinc-logreg.onnx")
# The name of the input is saved as part of the .onnx file.
# We are retreiving it because we will need it later.
input_name = sess.get_inputs()[0].name

# This code will run the model on our behalf.
query = "this is an example"
_, probas = sess.run(None, {input_name: np.array([[query]])})
probas[0]