To protect ourselvse against an evil
.joblib file we might consider
calculating a checksum of a file. A checksum of a file is basically a
hash value associated with the contents of a file. Every file can be
expected to have a unique hash value, so this might allow us to detect
if a file has been tampered with.
The code below demonstrates how you might calculate a checksum for a file using Python.
import hashlib def calc_checksum(path): md5_hash = hashlib.md5() with open(path, "rb") as f: content = f.read() md5_hash.update(content) digest = md5_hash.hexdigest() print(digest) calc_checksum("pipe.joblib") # 04a415025a812c2a69cb3552d83ee275 calc_checksum("pipe-evil.joblib") # 0b119f868ac251eee25af5c4b0c2064d
While this approach has merit to it, you will need to keep track of a checksum in order for this to work. So we may want to consider other tactics as well.