Question

我正在尝试进一步拆分已拆分的字符串，以进一步清理它并删除不必要的信息。这是一个用'/'

分隔的URL

['https:', '', 'expressjs.com', 'en', 'starter', 'hello-world.html']

我希望能够做到：

['https:', '', 'expressjs','com', 'en', 'starter', 'hello-world','html']

有什么想法吗？

Answer 1

re.split可以在每次匹配时为您的正则表达式拆分一个字符串

>>> re.split('[/\.]', 'https://expressjs.com/en/starter/hello-world.html')
['https:', '', 'expressjs', 'com', 'en', 'starter', 'hello-world', 'html']

[/\.]匹配任何正斜杠或句点字符

Answer 2

尝试一下：

L = ['https:', '', 'expressjs.com', 'en', 'starter', 'hello-world.html']
L =  [subitem for item in L for subitem in item.split('.')]

print(L)

输出：

['https:', '', 'expressjs', 'com', 'en', 'starter', 'hello-world', 'html']