URLs adhere to a standard.
[scheme:]//[user[:password]@]host[:port][/path][?query][#fragment]
The benefit of such a standard is that we might parse out the parts that we're interested in. Before doing that yourself though, it'd be better to check yarl. Yarl stands for Yet Another URL Library and it makes it easy to deal with URLs.
Here's an example of how you might use the library.
from yarl import URL
url = URL('https://www.example.com/things/query?arg=1&kw=2#frag')
url.host, url.scheme, url.path, url.query_string, url.query, url.fragment
The library can also deal with url encodings that are percent-decoded. A common example of this in a URL would be the whitespace character.
url = URL('https://www.example.com/hello world')
url.path, url.raw_path, url.human_repr()
Finally, it's also good to know that yarl doesn't just parse URLs. It
can also be used to build them. There's the .build()
constructor.
URL.build(scheme="https", host="example.com", query={"a": "b", "number": 1})
But you can also create a URL by chaining together appropriate python operators.
It's similar to the syntax in pathlib
.
URL("https://example.com") / 'things' % {"prop1": 1, "prop2": "a"}
Back to main.