Question

我正在尝试从不同的部分形成网址，并且无法理解此方法的行为。例如：

Python 3.x

from urllib.parse import urljoin

>>> urljoin('some', 'thing')
'thing'
>>> urljoin('http://some', 'thing')
'http://some/thing'
>>> urljoin('http://some/more', 'thing')
'http://some/thing'
>>> urljoin('http://some/more/', 'thing') # just a tad / after 'more'
'http://some/more/thing'
urljoin('http://some/more/', '/thing')
'http://some/thing'

你能解释一下这种方法的确切行为吗？

Answer 1

最好的方法（对我而言）是第一个参数，base就像您在浏览器中所使用的页面一样。第二个参数url是该页面上锚点的href。结果是您点击的最终网址。

>>> urljoin('some', 'thing')
'thing'

这个有意义给出我的描述。虽然人们希望基地包括一个计划和领域。

>>> urljoin('http://some', 'thing')
'http://some/thing'

如果您使用的是虚拟主机，并且有一个类似<a href='thing'>Foo</a>的锚点，那么该链接会将您带到http://some/thing

>>> urljoin('http://some/more', 'thing')
'http://some/thing'

我们在此处some/more，因此thing的相对链接会将我们带到/some/thing

>>> urljoin('http://some/more/', 'thing') # just a tad / after 'more'
'http://some/more/thing'

在这里，我们不在some/more，我们在some/more/，这是不同的。现在，我们的相对链接将我们带到some/more/thing

>>> urljoin('http://some/more/', '/thing')
'http://some/thing'

最后。如果在some/more/上且href为/thing，则您将与some/thing相关联。

Answer 2

urllib.parse.urljoin（base， url ）

如果url是绝对URL（即以//，http：//，https：//，...开头），则该URL的主机名和/或方案将出现在   结果。例如：

>>> urljoin('https://www.google.com', '//www.microsoft.com')
'https://www.microsoft.com'
>>>

否则，将使用urllib.parse。 urljoin （基本，URL）

通过组合“基本URL”（基本）和另一个URL（URL）来构造完整（“绝对”）URL。非正式地，这使用了基础组件 URL，尤其是寻址方案，网络位置和（部分），以在相对URL中提供缺少的组件。

>>> urlparse('http://a/b/c/d/e')
ParseResult(scheme='http', netloc='a', path='/b/c/d/e', params='', query='', fragment='')
>>> urljoin('http://a/b/c/d/e', 'f')
>>>'http://a/b/c/d/f'
>>> urlparse('http://a/b/c/d/e/')
ParseResult(scheme='http', netloc='a', path='/b/c/d/e/', params='', query='', fragment='')
>>> urljoin('http://a/b/c/d/e/', 'f')
'http://a/b/c/d/e/f'
>>>

它获取第一个参数（基本）的路径，剥去最后一个/之后的部分，并与第二个参数（URL）连接。

如果url以/开头，则它将使用url的方案和netloc一起加入

>>>urljoin('http://a/b/c/d/e', '/f')
'http://a/f'

Python：与urljoin的混淆

2 个答案: