Question

我的字符串包含 text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

我想像这样在列表中拆分 ["Baghdad, Iraq","United Arab Emirates (possibly)"]

我使用的代码无法提供预期的结果

re.split('\\s*([a-zA-Z\\d][).]|•)\\s*(?=[A-Z])', text)

对此请帮助我

Answer 1

您可以使用list comp和第二个正则表达式为示例创建所需数据：

import re

text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"

# different 1.regex pattern, same result - refining with 2nd pattern
data = [x for x in re.split(r'((?:^\s*[a-zA-Z0-9]\))|(?:\s+[a-zA-Z0-9]\)))\s*', 
                            text) if x and not re.match(r"\s*[a-zA-Z]\)",x)]
print(data)

输出：

['Baghdad, Iraq', 'United Arab Emirates (possibly)']

请参见https://regex101.com/r/wxEEQW/1

Answer 2

您可以简单地使用re.findall代替re.split：

import re
text = "a) Baghdad, Iraq b) United Arab Emirates (possibly)"
countries = list(filter(None, map(str.rstrip, re.split('\w\)\s', text))))

输出：

['Baghdad, Iraq', 'United Arab Emirates (possibly)']

将字符串拆分为包含按字母顺序的项目符号列表的列表

2 个答案: