我有一个字符串
string ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
我想根据括号细分为多个列表,因此请在使用过{p> 3的情况下引用
my_data = re.findall(r"(\(.*?\))",string)
但是当我打印my_data时,输出为(len = 4)
['((clearance)', '(embedded)', '(software engineer OR developer)', '(embedded)']
但是我想要的输出是(len = 2)
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']
因为“(清除)AND(嵌入式)AND(软件工程师或开发人员)”在一个括号中,而“嵌入式”在另一个括号中。但是“ re.findall”分为4个列表,为什么?
如果我想要我想要的输出,如何修改正则表达式?
答案 0 :(得分:3)
在纯正则表达式中,这是不可能的,因此这是一个需要加上括号的想法:
def find_stuff(string):
indices = []
counter = 0
change = {"(":1, ")":-1}
for i, el in enumerate(string):
new_count = counter + change.get(el, 0)
if counter==0 and new_count==1:
indices.append(i)
elif counter==1 and new_count==0:
indices.append(i+1)
counter = new_count
return indices
这不是很漂亮,但是我认为这个概念很明确。它返回外部括号的索引,因此您可以使用这些将字符串切成薄片
答案 1 :(得分:1)
有点re
骇客,但这是可能的:
>>> string ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
>>> [e for e in re.split(r'\((?=\()(.*?)(?<=\))\)|(?<!\()(\([^()]+\))(?!\))',string) if e and '(' in e and ')' in e]
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']