我有以下字符串:
(某些文字)或((其他文字)和(更多文字))和(更多文字)
我想要一个将其分成
的python正则表达式['(some text)', '((other text) and (some more text))', '(still more text)']
我试过这个,但它不起作用:
haystack = "(some text) or ((other text) and (some more text)) and (still more text)"
re.split('(or|and)(?![^(]*.\))', haystack) # no worky
感谢任何帮助。
答案 0 :(得分:2)
此解决方案适用于任意嵌套的括号,正则表达式不能(s
是原始字符串):
from pyparsing import nestedExpr
def lst_to_parens(elt):
if isinstance(elt,list):
return '(' + ' '.join(lst_to_parens(e) for e in elt) + ')'
else:
return elt
split = nestedExpr('(',')').parseString('(' + s + ')').asList()
split_lists = [elt for elt in split[0] if isinstance(elt,list)]
print ([lst_to_parens(elt) for elt in split_lists])
输出:
['(some text)', '((other text) and (some more text))', '(still more text)']
对于OP的真实测试用例:
s = "(substringof('needle',name)) or ((role eq 'needle') and (substringof('needle',email))) or (job eq 'needle') or (office eq 'needle')"
输出:
["(substringof ('needle' ,name))", "((role eq 'needle') and (substringof ('needle' ,email)))", "(job eq 'needle')", "(office eq 'needle')"]
答案 1 :(得分:1)
我会使用re.findall
代替re.split
。请注意,这仅适用于深度2的括号。
>>> import re
>>> s = '(some text) or ((other text) and (some more text)) and (still more text)'
>>> re.findall(r'\((?:\((?:\([^()]*\)|[^()]*)*\)|[^()])*\)', s)
['(some text)', '((other text) and (some more text))', '(still more text)']
>>>
答案 2 :(得分:1)
您也可以查看
import re
s = '(some text) or ((other text) and (some more text)) and (still more text)'
find_string = re.findall(r'[(]{2}[a-z\s()]*[)]{2}|[(][a-z\s]*[)]', s)
print(find_string)
输出:
['(some text)', '((other text) and (some more text))', '(still more text)']
修改强>
find_string = re.findall(r'[(\s]{2}[a-z\s()]*[)\s]{2}|[(][a-z\s]*[)]', s)
答案 3 :(得分:0)
你可以试试这个 re.split(' [a-f] +',' 0a3B9',flags = re.IGNORECASE)