To protect ourselvse against an evil .joblib
file we might consider
calculating a checksum of a file. A checksum of a file is basically a
hash value associated with the contents of a file. Every file can be
expected to have a unique hash value, so this might allow us to detect
if a file has been tampered with.
The code below demonstrates how you might calculate a checksum for a file using Python.
import hashlib
def calc_checksum(path):
md5_hash = hashlib.md5()
with open(path, "rb") as f:
content = f.read()
md5_hash.update(content)
digest = md5_hash.hexdigest()
print(digest)
calc_checksum("pipe.joblib") # 04a415025a812c2a69cb3552d83ee275
calc_checksum("pipe-evil.joblib") # 0b119f868ac251eee25af5c4b0c2064d
While this approach has merit to it, you will need to keep track of a checksum in order for this to work. So we may want to consider other tactics as well.