Calmcode - scikit save: security

Joblib, Pickles and Security

1 2 3 4 5 6

Let's assume that once again we've got a trained pipeline saved in a file called pipe.joblib. Then we can load the pipeline directly via

from joblib import load

trained = load("pipe.joblib")
trained.predict(["hello world"])

But let's consider what's happening. We're loading in a file that will be turned into a Python object. That means that arbitrary Python code might be running just by loading this. That's a huge security risk! If somebody tampered with that file, all sorts of bad things might happen.

Example

For example, the file might contain an object that behaves just like a pipeline but is running bad code as a side effect.

evil_pipe = EvilThing()
class EvilThing:
    def predict(self, X):
        print("fooled you!")
        return [1 for _ in X]

evil_pipe = EvilThing()

dump(evil_pipe, "pipe-evil.joblib")

The pipe-evol.joblib file will now contain malicious code. But you can still load it without realising it.

trained = load("pipe-evil.joblib")
trained.predict(["hello world"])

The example that we've shown here is only relatively innocent. Just from loading an untrusted .joblib file you can give access the server and risk leaking data to a 3rd party.

Serialization Attacks

This type of security leak falls in the "serialization attack" category. These kinds of attacks abuse the fact that objects need to be loaded from disk and they can lead to a lot of damage. If you're interested in a detailed demo of such an attack, you might appreciate this YouTube video.