Calmcode - gazpacho: pandas

From Gazpacho into Pandas

1 2 3 4 5 6

When you're scraping a website the goal is often to get the information into a pandas dataframe so that you can use it for further analysis. Let's do that!

This is the final blob of code to go from website to pandas.

import pandas as pd
from gazpacho import get, Soup

url = "https://pypi.org/project/pandas/#history"

html = get(url)
soup = Soup(html)
cards = soup.find('a', {'class': 'card'})

def parse_card(card):
    version = card.find("p", {"class": "release__version"}, partial=False).text
    timestamp = card.find("time").attrs['datetime']
    return {"version": version, 'timestamp': timestamp}

(pd.DataFrame([parse_card(c) for c in cards])
  .assign(timestamp=lambda d: pd.to_datetime(d['timestamp'])))

One final note on gazpacho: it is a nice package beacause it has no dependencies. It behaves like requests and beautifulsoup but it does not depend on it.