Question

我正在尝试获取字符串的“变体”，但是我不能全神贯注于如何做。让我解释一下我拥有的东西。

我的目标是获取URL的不同变体。假设我们有以下网址：

https://www.example.com/index/subindex/subsubindex

我正在与/

进行拆分

splitURL = str(initialUrl).split('/')

所以我留下了这样的东西

splitURL[0] = 'https:'
splitURL[1] = ''
splitURL[2] = 'www.example.com'
splitURL[3] = 'index'
splitURL[4] = 'subindex'
splitURL[5] = 'subsubindex'

按照以下方式获取列表的最佳方法是什么 list = [https://www.example.com/, https://www.example.com/index/, https://www.example.com/index/subindex/, https://www.example.com/index/subindex/subsubindex]？

我已经尝试过使用for items in splitURL来获取变体，但是我会以这种方式获得的第一个网址（例如https:）对我没有用。

我也尝试过使用for x in range (2,len(urlList)+1)，但是我仍然遇到indexOutOfBounds错误。

以“更精细”的方式做到这一点的任何方式？

Answer 1

从3开始，始终https://www.example.com，直到所有列表都结束，并以斜杠加入

res = ['/'.join(splitURL[:x]) for x in range (3,len(splitURL)+1)]

# ['https://www.example.com', 'https://www.example.com/index', 'https://www.example.com/index/subindex', 'https://www.example.com/index/subindex/subsubindex']

Answer 2

您应该真正使用urllib.parse模块。

from urllib import parse

def paths(path):
    for i, c in enumerate(res.path):
        if c == '/':
             yield path[:i]
    if path:
        yield path

>>> parsed_url = parse.urlparse('https://www.example.com/index/subindex/subsubindex')
>>> [f'{parsed_url.scheme}://{parsed_url.hostname}{p}' for p in paths(parsed_url.path)]
['https://www.example.com', 
 'https://www.example.com/index', 
 'https://www.example.com/index/subindex', 
 'https://www.example.com/index/subindex/subsubindex']

获取拆分字符串的迭代

2 个答案: