如果我有一个像'[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'
这样的凌乱字符串,并且想要将其拆分为一个列表,以便任何括号内的每个部分都是一个像['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']
这样的项目,我该怎么做?我想不出一种使.split()
方法起作用的方法。
答案 0 :(得分:1)
您可以使用正则表达式
import re
s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'
lst = [x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
print(lst)
输出
['Carrots', 'Broccoli', 'cucumber', 'tomato', 'spinach']
说明
要匹配的正则表达式模式
r'\[(.*?)\]|\((.*?)\)'
子模式1:匹配方括号中的项目,即[...]
\[(.*?)\] # Use \[ and \] since [, ] are special characters
# we have to escape so they will be literal
(.*?) # Is a Lazy match of all characters
子模式2:在括号中进行匹配,即(..)
\((.*?)\) # Use \( and \) since (, ) are special characters
# we have to escape so they will be literal
由于我们正在寻找以下两种模式之一:
'|' # which is or between the two subpatterns
# to match Subpattern1 or Subpattern
表达式
re.findall(r'\[(.*?)\]|\((.*?)\)', s)
[('Carrots', ''), ('Broccoli', ''), ('', 'cucumber'), ('', 'tomato'), ('spinach', '')]
结果在第一个或第二个元组中。因此我们使用:
[x[0] or x[1] for x in re.findall(r'\[(.*?)\]|\((.*?)\)', s)]
要从第一个或第二个元组中提取数据并将其放入列表中。
答案 1 :(得分:0)
没有任何错误处理(例如检查嵌套的或不平衡的括号):
def parse(expr):
opening = "(["
closing = ")]"
result = []
current_item = ""
for char in expr:
if char in opening:
current_item = ""
continue
if char in closing:
result.append(current_item)
continue
current_item += char
return result
print(parse("(a)(b) stuff (c) [d] more stuff - (xxx)."))
>>> ['a', 'b', 'c', 'd', 'xxx']
根据您的需求,这可能已经足够了……
答案 2 :(得分:0)
假设未使用示例字符串中没有的方括号或运算符(例如'-'),请尝试
s = '[Carrots] [Broccoli] (cucumber)-(tomato) irrelevant [spinach]'
words = []
for elem in s.replace('-', ' ').split():
if '[' in elem or '(' in elem:
words.append(elem.strip('[]()'))
words = [elem.strip('[]()') for elem in s.replace('-', ' ').split() if '[' in elem or '(' in elem]