根据URL模板合并相对路径

时间:2017-04-05 15:27:56

标签: python url path

我正在尝试合并用户提供的URL相对路径和文件路径。例如,如果给出以下项目:

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

所需的输出是

http://myserver.com/my/path/to/files/foo.txt

已合并URL和文件之间的公共路径元素; my/path/to/filespath/to/files/foo.txt合并为my/path/to/files/foo.txt,并附加回网址的基础。

我能得到的最接近的是:

# python 2.7
import os
import urlparse
from collections import OrderedDict

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

url = urlparse.urlparse(url_base)
print(url)
# ParseResult(scheme='http', netloc='myserver.com', path='/my/path/to/files', params='', query='', fragment='')

merge_path = os.path.join(url.path, path)
print(merge_path)
# /my/path/to/files/path/to/files/foo.txt

# take an ordered set of the path components
# this is not good because it assumes '/' is the split key
merge_path_set = list(OrderedDict.fromkeys(merge_path.split('/')))
print(merge_path_set)
# ['', 'my', 'path', 'to', 'files', 'foo.txt']

path_joined = os.path.join(*merge_path_set)
print(path_joined)
# my/path/to/files/foo.txt

# THIS DOESN'T WORK:
url_joined = urlparse.urljoin(url.netloc, path_joined)
print(url_joined)
# my/path/to/files/foo.txt

似乎应该有更好的方法来使用内置库而不是手动拆分'/'并采用有序集,就像我在这里所做的那样。我还没弄明白如何将其恢复到输出的URL中。有什么想法吗?

1 个答案:

答案 0 :(得分:0)

如果您将第二个参数与urljoin()的路径组件对齐,则

url_base可以正常工作。

对于Python 2.7:

from urlparse import urljoin

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

final_url = urljoin(url_base, '/my/' + path)

# http://myserver.com/my/path/to/files/foo.txt

对于Python 3:

from urllib.parse import urljoin

url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'

final_url = urljoin(url_base, '/my/' + path)

# http://myserver.com/my/path/to/files/foo.txt

假设path/to/files的{​​{1}}始终与path的{​​{1}}组件匹配,并且您可以在path/to/files附加'/',虽然它确实使用url_base的变体,但你可以这样做:

url_base