Let's assume that once again we've got a trained pipeline saved
in a file called
pipe.joblib. Then we can load the pipeline
from joblib import load trained = load("pipe.joblib") trained.predict(["hello world"])
But let's consider what's happening. We're loading in a file that will be turned into a Python object. That means that arbitrary Python code might be running just by loading this. That's a huge security risk! If somebody tampered with that file, all sorts of bad things might happen.
For example, the file might contain an object that behaves just like a pipeline but is running bad code as a side effect.
evil_pipe = EvilThing() class EvilThing: def predict(self, X): print("fooled you!") return [1 for _ in X] evil_pipe = EvilThing() dump(evil_pipe, "pipe-evil.joblib")
pipe-evol.joblib file will now contain malicious code. But
you can still load it without realising it.
trained = load("pipe-evil.joblib") trained.predict(["hello world"])
The example that we've shown here is only relatively innocent.
Just from loading an untrusted
.joblib file you can give
access the server and risk leaking data to a 3rd party.
This type of security leak falls in the "serialization attack" category. These kinds of attacks abuse the fact that objects need to be loaded from disk and they can lead to a lot of damage. If you're interested in a detailed demo of such an attack, you might appreciate this YouTube video.