Let's apply a learning trick. We can, after all, loop over our datapoints multiple times.
This is the code before.
mod_sgd = SGDClassifier()
data = []
# We've added an extra loop here
for i, x in enumerate(X_train):
# Pay attention to `classes` here, we need it!
mod_sgd.partial_fit([x], [y_train[i]], classes=[0, 1])
data.append({
'c1': mod_sgd.coef_.flatten()[0],
'c2': mod_sgd.coef_.flatten()[1],
'mod_sgd': np.mean(mod_sgd.predict(X_test) == y_test),
'normal_acc_test': normal_acc_test,
'i': i
})
df_stats = pd.DataFrame(data)
This is the code after.
mod_sgd = SGDClassifier()
data = []
for j in range(3):
for i, x in enumerate(X_train):
# Pay attention to `classes` here, we need it!
mod_sgd.partial_fit([x], [y_train[i]], classes=[0, 1])
data.append({
'c1': mod_sgd.coef_.flatten()[0],
'c2': mod_sgd.coef_.flatten()[1],
'mod_sgd': np.mean(mod_sgd.predict(X_test) == y_test),
'normal_acc_test': normal_acc_test,
'i': i + X_train.shape[0] * j
})
df_stats = pd.DataFrame(data)
Given that you're able to loop multiple times over the same data. You might wonder what else you can do. You could preprocess each batch with some noise to get a more robust model.
Learning More
If you like, you can download the full notebook from this series on GitHub.
You can also learn more about .partial_fit()
techniques on the
scikit-learn docs.