Zip Files for Pandas

Pandas is pretty flexible with formats when you read/write csvs. For example, you can read dataframes from a url.

import pandas as pd

df = pd.read_csv("")

You can save this file to disk.

df.to_csv("birthdays.csv", index=False)

But! You can also save it to disk as a zip file.

df.to_csv("", index=False)

This zipped file is a fair bit lighter than the standard .csv file.

> ls -lhat birthdays*
-rw-r--r--  1 vincentwarmerdam  staff   1.6M  3 Jun 21:06
-rw-r--r--  1 vincentwarmerdam  staff    11M  3 Jun 21:06 stocks.csv

But this .zip file can also be read natively, just like a .csv.

pd.read_csv("") == pd.read_csv("stocks.csv")

For very large files with many repeated values this can save a substantial amount of disk space. These .zip files can also be hosted online and downloaded just like the original .csv file.

