查找括号之间文本的正则表达式错误

时间:2018-12-12 15:39:57

标签: python regex

我有一个字符串

string  ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'

我想根据括号细分为多个列表,因此请在使用过{p> 3的情况下引用

my_data = re.findall(r"(\(.*?\))",string)

但是当我打印my_data时,输出为(len = 4)

['((clearance)', '(embedded)', '(software engineer OR developer)', '(embedded)']

但是我想要的输出是(len = 2)

['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']

因为“(清除)AND(嵌入式)AND(软件工程师或开发人员)”在一个括号中,而“嵌入式”在另一个括号中。但是“ re.findall”分为4个列表,为什么?

如果我想要我想要的输出,如何修改正则表达式?

2 个答案:

答案 0 :(得分:3)

在纯正则表达式中,这是不可能的,因此这是一个需要加上括号的想法:

def find_stuff(string):
    indices = []
    counter = 0
    change = {"(":1, ")":-1}
    for i, el in enumerate(string):
        new_count = counter + change.get(el, 0)
        if counter==0 and new_count==1:
            indices.append(i)
        elif counter==1 and new_count==0:
            indices.append(i+1)
        counter = new_count
    return indices

这不是很漂亮,但是我认为这个概念很明确。它返回外部括号的索引,因此您可以使用这些将字符串切成薄片

答案 1 :(得分:1)

有点re骇客,但这是可能的:

>>> string  ='((clearance) AND (embedded) AND (software engineer OR developer)) AND (embedded)'
>>> [e for e in re.split(r'\((?=\()(.*?)(?<=\))\)|(?<!\()(\([^()]+\))(?!\))',string) if e and '(' in e and ')' in e]
['(clearance) AND (embedded) AND (software engineer OR developer)', '(embedded)']