Calmcode - gazpacho: attributes

Accessing attributes with Gazpacho

1 2 3 4 5 6

Sometimes you're not interested in parsing the text from an html element but rather you're interested in retreiving attributes. This also holds true for the website that we are currently scraping. To fetch the right information this time around we'll need to use .attrs instead of .text.

# this is a dictionary
cards[0].find("time").attrs
# this is the information we're interested in
cards[0].find("time").attrs['datetime']

The full code, with the parse_card function is listed below.

from gazpacho import get, Soup

url = "https://pypi.org/project/pandas/#history"

html = get(url)
soup = Soup(html)
cards = soup.find('a', {'class': 'card'})

def parse_card(card):
    version = card.find("p", {"class": "release__version"}, partial=False).text
    timestamp = card.find("time").attrs['datetime']
    return {"version": version, 'timestamp': timestamp}

[parse_card(c) for c in cards]