生成URL端点的组合

时间:2019-06-06 15:42:23

标签: python python-2.7 url urlparse

我的网址如下:

http://example.com/foo/bar/baz/file.php

我有一个名为/potato的端点。

我想从中生成以下URL:

http://example.com/foo/potato
http://example.com/foo/bar/potato
http://example.com/foo/bar/baz/potato

到目前为止,我的尝试涉及到以斜杠分割,并且错过了端点本身以/等开头的情况。

最干净,最Python化的方法是什么?

1 个答案:

答案 0 :(得分:2)

您可以使用列表理解:

import re
s = 'http://example.com/foo/bar/baz/file.php'
*path, _ = re.split('(?<=\w)/(?=\w)', s)
results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]

输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

编辑:Python2.7解决方案:

import re
s = 'http://example.com/foo/bar/baz/file.php'
path = re.split('(?<=\w)/(?=\w)', s)[:-1]
result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]

输出:

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

可靠而准确地解析网址的另一种可能性是使用urllib.parse

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]

输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

在带有urlparse的Python2.7中:

import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]

输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

编辑2:时间:

可以找到计时来源here

enter image description here

从图中可以看出,在大多数情况下,urlparsere慢。

编辑3:通用解决方案:

import re
def generate_url_combos(s, endpoint):
   path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
   return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

输出:

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

编辑4:

import urlparse, re
def generate_url_combos(s, endpoint):
   d = urlparse.urlparse(s)
   path = list(filter(None, d.path.split('/')))
   if not path:
     return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
   path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
   return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint))  for i in range(len(path)+1)]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

输出:

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']