A handy companion to handle URLs in Python
furl at your rescue
If you ever manipulate URLs in Python using the urlparse library, you may have felt frustrated because you had to juggle several APIs to get to your end. With furl you have an intuitive and uniform API to get the job done. I will introduce it in this blog post.
Installation
To install it, you will need python2.7 or higher (Yes! This is one of the rare libraries still supporting python 2!). Then, you can use pip or a modern tool like poetry to install it.
$ pip install furl
# or with poetry
$ poetry add furl
Usage
Here is a basic usage example.
from furl import furl
f = furl('https://username:password@example.com/some/path/?a=b#fragment')
print(f.scheme) # https
print(f.username) # username
print(f.password) # password
print(f.netloc) # username:password@example.com
print(f.host) # example.com
print(f.origin) # https://example.com
print(f.path) # /some/path
print(f.query) # a=b
print(f.fragment) # fragment
Here we get much information about the URL as we can always do with urlparse module. What is really cool is that we can easily change parts of the URL and the encoding of parameters is done automatically without having to juggle with APIs like quote.
Query
Here you can see how to add/remove query parameters.
from furl import furl
f = furl('https://example.com')
f.query.add({'one': 'two', 'hello': 'world'})
print(f.url)
# https://example.com?one=two&hello=world
f = f.remove(['one'])
print(f.url)
# https://example.com?hello=world
There is another way to see or handle query parameters using the args
property.
f = furl('https://example.com?hello=world')
print(f.args) # {'hello': 'world'}
f.args['foo'] = 'bar'
print(f.args) # {'hello': 'world', 'foo': 'bar'}
del f.args['hello']
print(f.args) # {'foo': 'bar'}
furl also handles encoding seamlessly.
f = furl('https://example.com')
f.query.add({'param with space': 'hehe', 'an emoji!': '☺'})
print(f.url)
# https://example.com?param+with+space=hehe&an+emoji%21=%E2%98%BA
Path
We can access the different segments of a path.
from furl import furl
f = furl('https://www.google.com/a/large ish/path')
print(f.path)
# /a/large%20ish/path
print(f.path.segments)
# ['a', 'large ish', 'path']
Changing the path is easy.
f = furl('https://example.com/a/large ish/path')
f.path.segments = ['a', 'new', 'path']
print(f.path) # /a/new/path
f.path = 'or/this/way'
print(f.path) # or/this/way
print(f.path.segments) # ['or', 'this', 'way']
Note that setting the path
attribute directly may cause you issues with a static code analyzer. This is because there is no property setter defined to handle the path
attribute. Instead, it is handled with the __setattr__
method. I guess it is because the author wanted to support the slash operator to modify the path as follows:
f = furl('https://example.com')
f.path /= 'a'
print(f.path) # /a
f.path = f.path / 'new' / 'path'
print(f.path) # /a/new/path
So if you don’t want to deal with this issue, just use the first method consisting of changing the segments
property.
We can know if a path ends with a slash by looking at the properties isdir
and isfile
.
f = furl('https://example.com/is/dir/')
print(f.path.isdir) # True
print(f.path.isfile) # False
f = furl('https://example.com/is/file')
print(f.path.isdir) # False
print(f.path.isfile) # True
We can normalize a path containing more slashes than needed.
f = furl('https://example.com////a/./b/lolsup/../c/')
f.path.normalize()
print(f.url) # https://example.com/a/b/c/
Fragment
It is worth mentioning that fragments can have a path and a query. Let’s see the following examples.
f = furl('https://example.com')
print(f.fragment) # None
f.fragment.path = 'hell'
print(f.fragment) # hell
print(f.url) # https://example.com#hell
f.fragment.path.segments.append('foo')
print(f.fragment) # hell/foo
f.fragment.query = 'one=two&hello=world'
print(f.fragment) # hell/foo?one=two&hello=world
del f.fragment.args['one']
f.fragment.args['fruit'] = 'apple'
print(f.fragment) # hell/foo?hello=world&fruit=apple
Miscellaneous
Of course, furl
understands other types of URLs.
f = furl('file:///c:/Windows')
print(f.scheme) # file
print(f.origin) # file://
print(f.path) # /c:/Windows
We can set multiple parts of an URL with the set
method.
f = furl('https://example.com')
# note that international domain names are handled
f.set(host='ドメイン.テスト', path='джк', query='☃=☺')
print(f.url)
# https://xn--eckwd4c7c.xn--zckzah/%D0%B4%D0%B6%D0%BA?%E2%98%83=%E2%98%BA
We can copy an URL object if you don’t want to alter the original one.
f1 = furl('https://example.com')
f2 = f1.copy().set(args={'one': 'two'}, path='/path')
print(f1.url) # https://example.com
print(f2.url) # https://example.com/path?one=two
We can join URLs. The idea is to join the furl
object's URL with the provided relative or absolute URL and returns the furl object for method chaining.
f = furl('https://www.foo.com')
f.join('new/path')
print(f.url) # https://www.foo.com/new/path
f.join('../replaced')
print(f.url) # https://www.foo.com/replaced
f.join('path?query=yes#fragment')
print(f.url) # https://www.foo.com/path?query=yes#fragment
f.join('ftp://baba.com/path')
print(f.url) # ftp://baba.com/path
Finally, we can inspect various information about a furl
object with the asdict
method.
from pprint import pprint
from furl import furl
f = furl('https://xn--eckwd4c7c.xn--zckzah/path?foo=bar#frag')
pprint(f.asdict(), indent=4)
You will have this output:
{ 'fragment': { 'encoded': 'frag',
'path': { 'encoded': 'frag',
'isabsolute': False,
'isdir': False,
'isfile': True,
'segments': ['frag']},
'query': {'encoded': '', 'params': []},
'separator': True},
'host': 'ドメイン.テスト',
'host_encoded': 'xn--eckwd4c7c.xn--zckzah',
'netloc': 'xn--eckwd4c7c.xn--zckzah',
'origin': 'https://xn--eckwd4c7c.xn--zckzah',
'password': None,
'path': { 'encoded': '/path',
'isabsolute': True,
'isdir': False,
'isfile': True,
'segments': ['path']},
'port': 443,
'query': {'encoded': 'foo=bar', 'params': [('foo', 'bar')]},
'scheme': 'https',
'url': 'https://xn--eckwd4c7c.xn--zckzah/path?foo=bar#frag',
'username': None}
This is all for this article, hope you enjoy reading it. Take care of yourself and see you soon! 🙂
If you like my article and want to continue learning with me, don’t hesitate to follow me here and subscribe to my newsletter on substack 😉