Calmcode - partial_fit: conclusion

Potential benefits of online-learning in scikit-learn.

1 2 3 4 5 6 7 8

Let's apply a learning trick. We can, after all, loop over our datapoints multiple times.

This is the code before.

mod_sgd = SGDClassifier()
data = []

# We've added an extra loop here
for i, x in enumerate(X_train):
    # Pay attention to `classes` here, we need it!
    mod_sgd.partial_fit([x], [y_train[i]], classes=[0, 1])
    data.append({
        'c1': mod_sgd.coef_.flatten()[0],
        'c2': mod_sgd.coef_.flatten()[1],
        'mod_sgd': np.mean(mod_sgd.predict(X_test) == y_test),
        'normal_acc_test': normal_acc_test,
        'i': i
    })

df_stats = pd.DataFrame(data)

This is the code after.

mod_sgd = SGDClassifier()
data = []

for j in range(3):
    for i, x in enumerate(X_train):
        # Pay attention to `classes` here, we need it!
        mod_sgd.partial_fit([x], [y_train[i]], classes=[0, 1])
        data.append({
            'c1': mod_sgd.coef_.flatten()[0],
            'c2': mod_sgd.coef_.flatten()[1],
            'mod_sgd': np.mean(mod_sgd.predict(X_test) == y_test),
            'normal_acc_test': normal_acc_test,
            'i': i + X_train.shape[0] * j
        })

df_stats = pd.DataFrame(data)

Given that you're able to loop multiple times over the same data. You might wonder what else you can do. You could preprocess each batch with some noise to get a more robust model.

Learning More

If you like, you can download the full notebook from this series on GitHub.

You can also learn more about .partial_fit() techniques on the scikit-learn docs.