logo

... gazpacho.


You may need to scrape a website once in a while. That means that we'll need a convenient tool for it. You could use tools like requests combined with beautiful soup but since you'll only be using a small subset of these libraries most of the time you may be able to make do with a simpler package: gazpacho.


Episode Notes

The code that nicely parses the website is listed below.

url = "https://pypi.org/project/pandas/#history"

from gazpacho import get, Soup
html = get(url)
soup = Soup(html)
cards = soup.find('a', {'class': 'card'})

def parse_card(card):
    version_number = card.find('p', {'class': 'release__version'}, strict=True).text
    timestamp = card.find('time').attrs['datetime']
    return {'version': version_number, 'timestamp': timestamp}

[parse_card(c) for c in cards]

Feedback? See an issue? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.