我正在尝试合并用户提供的URL相对路径和文件路径。例如,如果给出以下项目:
url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'
所需的输出是
http://myserver.com/my/path/to/files/foo.txt
已合并URL和文件之间的公共路径元素; my/path/to/files
和path/to/files/foo.txt
合并为my/path/to/files/foo.txt
,并附加回网址的基础。
我能得到的最接近的是:
# python 2.7
import os
import urlparse
from collections import OrderedDict
url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'
url = urlparse.urlparse(url_base)
print(url)
# ParseResult(scheme='http', netloc='myserver.com', path='/my/path/to/files', params='', query='', fragment='')
merge_path = os.path.join(url.path, path)
print(merge_path)
# /my/path/to/files/path/to/files/foo.txt
# take an ordered set of the path components
# this is not good because it assumes '/' is the split key
merge_path_set = list(OrderedDict.fromkeys(merge_path.split('/')))
print(merge_path_set)
# ['', 'my', 'path', 'to', 'files', 'foo.txt']
path_joined = os.path.join(*merge_path_set)
print(path_joined)
# my/path/to/files/foo.txt
# THIS DOESN'T WORK:
url_joined = urlparse.urljoin(url.netloc, path_joined)
print(url_joined)
# my/path/to/files/foo.txt
似乎应该有更好的方法来使用内置库而不是手动拆分'/'
并采用有序集,就像我在这里所做的那样。我还没弄明白如何将其恢复到输出的URL中。有什么想法吗?
答案 0 :(得分:0)
urljoin()
的路径组件对齐,则 url_base
可以正常工作。
对于Python 2.7:
from urlparse import urljoin
url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'
final_url = urljoin(url_base, '/my/' + path)
# http://myserver.com/my/path/to/files/foo.txt
对于Python 3:
from urllib.parse import urljoin
url_base = 'http://myserver.com/my/path/to/files'
path = 'path/to/files/foo.txt'
final_url = urljoin(url_base, '/my/' + path)
# http://myserver.com/my/path/to/files/foo.txt
假设path/to/files
的{{1}}始终与path
的{{1}}组件匹配,并且您可以在path/to/files
附加'/',虽然它确实使用url_base
的变体,但你可以这样做:
url_base