Question

我有以下字符串：

（某些文字）或（（其他文字）和（更多文字））和（更多文字）

我想要一个将其分成

的python正则表达式

['(some text)', '((other text) and (some more text))', '(still more text)']

我试过这个，但它不起作用：

haystack = "(some text) or ((other text) and (some more text)) and (still more text)"
re.split('(or|and)(?![^(]*.\))', haystack) # no worky

感谢任何帮助。

Answer 1

此解决方案适用于任意嵌套的括号，正则表达式不能（s是原始字符串）：

from pyparsing import nestedExpr
def lst_to_parens(elt):
    if isinstance(elt,list):
        return '(' + ' '.join(lst_to_parens(e) for e in elt) + ')'
    else:
        return elt

split = nestedExpr('(',')').parseString('(' + s + ')').asList()
split_lists = [elt for elt in split[0] if isinstance(elt,list)]
print ([lst_to_parens(elt) for elt in split_lists])

输出：

['(some text)', '((other text) and (some more text))', '(still more text)']

对于OP的真实测试用例：

s = "(substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle')"

输出：

["(substringof ('needle' ,name))", "((role eq 'needle') and (substringof ('needle' ,email)))", "(job eq 'needle')", "(office eq 'needle')"]

Answer 2

我会使用re.findall代替re.split。请注意，这仅适用于深度2的括号。

>>> import re
>>> s = '(some text) or ((other text) and (some more text)) and (still more text)'
>>> re.findall(r'\((?:\((?:\([^()]*\)|[^()]*)*\)|[^()])*\)', s)
['(some text)', '((other text) and (some more text))', '(still more text)']
>>>

Answer 3

您也可以查看

import re
s = '(some text) or ((other text) and (some more text)) and (still more text)'
find_string = re.findall(r'[(]{2}[a-z\s()]*[)]{2}|[(][a-z\s]*[)]', s)
print(find_string)

输出：

['(some text)', '((other text) and (some more text))', '(still more text)']

修改

find_string = re.findall(r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]', s)

Answer 4

你可以试试这个 re.split（＆＃39; [a-f] +＆＃39;，＆＃39; 0a3B9＆＃39;，flags = re.IGNORECASE）

python按'和'和'或'拆分，但不在括号中

4 个答案: