Calmcode - diskcache: introduction

The diskcache usecase

1 2 3 4 5

Let's say that you're making a web request maybe to an API. Here's one such example that uses the Star Wars API.

import requests as rq

def fetch_starwars_person(i):
    return rq.get(f"https://swapi.dev/api/people/{i}/").json()

fetch_starwars_person(1)

When this code runs, you'll see it return output like this:

{
   "birth_year":"19BBY",
   "created":"2014-12-09T13:50:51.644000Z",
   "edited":"2014-12-20T21:17:56.891000Z",
   "eye_color":"blue",
   "films":[
      "https://swapi.dev/api/films/1/",
      "https://swapi.dev/api/films/2/",
      "https://swapi.dev/api/films/3/",
      "https://swapi.dev/api/films/6/"
   ],
   "gender":"male",
   "hair_color":"blond",
   "height":"172",
   "homeworld":"https://swapi.dev/api/planets/1/",
   "mass":"77",
   "name":"Luke Skywalker",
   "skin_color":"fair",
   "species":[
      
   ],
   "starships":[
      "https://swapi.dev/api/starships/12/",
      "https://swapi.dev/api/starships/22/"
   ],
   "url":"https://swapi.dev/api/people/1/",
   "vehicles":[
      "https://swapi.dev/api/vehicles/14/",
      "https://swapi.dev/api/vehicles/30/"
   ]
}

But the request does take a bit of time. We can measure this in Jupyter via the %%time magic command.

%%time

fetch_starwars_person(1)

When we run this on our machine it seems that it takes about a second to return the request. That could be fine, but you can imagine that if we were to repeat this request multiple times that we'd be waiting a whole lot which feels wasteful.

Memory

Instead of rerunning the same request and waiting for a response we could choose to cache the result. A convenient method for this is the lru_cache decorator that can be found in functools.

from functools import lru_cache 

@lru_cache 
def fetch_starwars_person(i):
    return rq.get(f"https://swapi.dev/api/people/{i}/").json()

The first time you'll run this function it'll take a second again, but after that it'll be much faster! We see speeds around the 9.06 µs mark.

Downside

This trick is great, but it isn't perfect either. What happens if we restart the notebook? In that case we'd lose the state in the memory and we'd have to start all over again. For a single request and a single endpoint this isn't the end of the world ... but it would be nice if we could cache a whole bunch of data and re-use it at a later point in time.

For use-cases like this ... you might be interested in using diskcache. It's a joyful little library that uses Sqlite under the hood to act as a cache that is stored on disk. It comes with no dependencies and we've found it to be a very likeable tool to use.

The goal of this series of videos is to do a deep dive on this tool and to show how you might use it in some of your projects.