Sometimes you're not interested in parsing the text from an html element
but rather you're interested in retreiving attributes. This also holds true
for the website that we are currently scraping. To fetch the right information
this time around we'll need to use .attrs
instead of .text
.
# this is a dictionary
cards[0].find("time").attrs
# this is the information we're interested in
cards[0].find("time").attrs['datetime']
The full code, with the parse_card
function is listed below.
from gazpacho import get, Soup
url = "https://pypi.org/project/pandas/#history"
html = get(url)
soup = Soup(html)
cards = soup.find('a', {'class': 'card'})
def parse_card(card):
version = card.find("p", {"class": "release__version"}, partial=False).text
timestamp = card.find("time").attrs['datetime']
return {"version": version, 'timestamp': timestamp}
[parse_card(c) for c in cards]