Pandas is pretty flexible with formats when you read/write csvs. For example, you can read dataframes from a url.
import pandas as pd
df = pd.read_csv("https://calmcode.io/datasets/birthdays.csv")
You can save this file to disk.
df.to_csv("birthdays.csv", index=False)
But! You can also save it to disk as a zip file.
df.to_csv("birthdays.zip", index=False)
This zipped file is a fair bit lighter than the standard .csv file.
> ls -lhat birthdays*
-rw-r--r-- 1 vincentwarmerdam staff 1.6M 3 Jun 21:06 stocks.zip
-rw-r--r-- 1 vincentwarmerdam staff 11M 3 Jun 21:06 stocks.csv
But this .zip
file can also be read natively, just like a .csv
.
pd.read_csv("stocks.zip") == pd.read_csv("stocks.csv")
For very large files with many repeated values this can save
a substantial amount of disk space. These .zip
files can
also be hosted online and downloaded just like the original
.csv
file.
Back to main.