logo

... gazpacho.


You may need to scrape a website once in a while. That means that we'll need a convenient tool for it. You could use tools like requests combined with beautiful soup but since you'll only be using a small subset of these libraries most of the time you may be able to make do with a simpler package: gazpacho.


Notes

This is the final blob of code to go from website to pandas.

import pandas as pd 
from gazpacho import get, Soup

url = "https://pypi.org/project/pandas/#history"

html = get(url)
soup = Soup(html)
cards = soup.find('a', {'class': 'card'})

def parse_card(card):
    version_number = card.find('p', {'class': 'release__version'}, strict=True).text
    timestamp = card.find('time').attrs['datetime']
    return {'version': version_number, 'timestamp': timestamp}

pd.DataFrame([parse_card(c) for c in cards])

One final note on gazpacho: it is a nice package beacause it has no dependencies. It behaves like requests and beautifulsoup but it does not depend on it.


Feedback? See an issue? Something unclear? Feel free to mention it here.

If you want to be kept up to date, consider getting the newsletter.