Python urljoin无法将相对和绝对URL正确连接在一起

时间:2018-08-16 14:53:33

标签: python url join urllib

我有这两个网址:

absolute_url = 'https://ciechgroup.com/en/relacje-inwestorskie/reports/current-reports'
relative_url = 'en/relacje-inwestorskie/reports/current-reports/2018/242018/'

我想加入他们的行列来创建这个

https://ciechgroup.com/en/relacje-inwestorskie/reports/current-reports/2018/242018/

但是,urljoin不能将URL正确地连接在一起:

from urllib.parse import urljoin

urljoin(absolute_url, relative_url)

>> https://ciechgroup.com/en/relacje-inwestorskie/reports/en/relacje-inwestorskie/reports/current-reports/2018/242018/

您知道如何在不重复部分网址的情况下实现这一目标吗?

2 个答案:

答案 0 :(得分:3)

在您的relative_url中添加一个/

from urllib.parse import urljoin
absolute_url = 'https://ciechgroup.com/en/relacje-inwestorskie/reports/current-reports'
relative_url = '/en/relacje-inwestorskie/reports/current-reports/2018/242018/'
>>> urljoin(absolute_url, relative_url)
'https://ciechgroup.com/en/relacje-inwestorskie/reports/current-reports/2018/242018/'

答案 1 :(得分:1)

urljoin正在执行应做的工作。它以您的绝对URL(/en/relacje-inwestorskie/reports/)的“当前路径”为基础,相对URL将“相对于”。结果的确是/en/relacje-inwestorskie/reports/en/relacje-inwestorskie/reports/current-reports/2018/242018/

从您的预期结果来看,您的relative_url实际上是一条绝对路径,因此您需要在其前面加上/

>>> absolute_url = 'https://ciechgroup.com/en/relacje-inwestorskie/reports/current-reports'
>>> relative_url = '/en/relacje-inwestorskie/reports/current-reports/2018/242018/'
>>> from urllib.parse import urljoin
>>> urljoin(absolute_url, relative_url)
'https://ciechgroup.com/en/relacje-inwestorskie/reports/current-reports/2018/242018/'