我有一个类似于
的字符串s = "(test1 or (test2 or test3)) and (test4 and (test6)) and (test7 or test8) and test9"
我正在尝试在()之间提取
['test1 or (test2 or test3)', 'test4 and (test6)', 'test7 or test8']
我尝试过
result = re.search('%s(.*)%s' % ("(", ")"), s).group(1)
result =(s[s.find("(")+1 : s.find(")")])
result = re.search('((.*))', s)
答案 0 :(得分:2)
您有嵌套括号。这就需要解析,或者如果您不想走那么远,请回到基础知识,逐个字符地进行解析以找到每个组的0嵌套级别。
然后黑客先删除and
令牌(如果有的话)。
我为此编写的代码。不短,也不是很复杂,设备齐全,没有多余的库:
s = "(test1 or (test2 or test3)) and (test4 and (test6)) and (test7 or test8) and test9"
nesting_level = 0
previous_group_index = 0
def rework_group(group):
# not the brightest function but works. Maybe needs tuning
# that's not the core of the algorithm but simple string operations
# look for the first opening parenthese, remove what's before
idx = group.find("(")
if idx!=-1:
group = group[idx:]
else:
# no parentheses: split according to blanks, keep last item
group = group.split()[-1]
return group
result = []
for i,c in enumerate(s):
if c=='(':
nesting_level += 1
elif c==')':
nesting_level -= 1
if nesting_level == 0:
result.append(rework_group(s[previous_group_index:i+1]))
previous_group_index = i+1
result.append(rework_group(s[previous_group_index:]))
结果:
>>> result
['(test1 or (test2 or test3))',
'(test4 and (test6))',
'(test7 or test8)',
'test9']
>>>
答案 1 :(得分:0)
如果您确实想为此做一个粗略的解析器,它将看起来像这样。
这使用模式对象的scanner
方法,在级别0(通过遇到的左右括号定义级别)时遍历并构建列表。
import re
# Token specification
TEST = r'(?P<TEST>test[0-9]*)'
LEFT_BRACKET = r'(?P<LEFT_BRACKET>\()'
RIGHT_BRACKET = r'(?P<RIGHT_BRACKET>\))'
AND = r'(?P<AND> and )'
OR = r'(?P<OR> or )'
master_pat = re.compile('|'.join([TEST, LEFT_BRACKET, RIGHT_BRACKET, AND, OR]))
s = "(test1 or (test2 or test3)) and (test4 and (test6)) and (test7 or test8) and test9"
def generate_list(pat, text):
ans = []
elem = ''
level = 0
scanner = pat.scanner(text)
for m in iter(scanner.match, None):
# print(m.lastgroup, m.group(), level)
# keep building elem if nested or not tokens to skip for level=0,1
if (level > 1 or
(level == 1 and m.lastgroup != 'RIGHT_BRACKET') or
(level == 0 and m.lastgroup not in ['LEFT_BRACKET', 'AND'])
):
elem += m.group()
# if at level 0 we can append
if level == 0 and elem != '':
ans.append(elem)
elem = ''
# set level
if m.lastgroup == 'LEFT_BRACKET':
level += 1
elif m.lastgroup == 'RIGHT_BRACKET':
level -= 1
return ans
generate_list(master_pat, s)
# ['test1 or (test2 or test3)', 'test4 and (test6)', 'test7 or test8', 'test9']
查看scanner
的行为:
master_pat = re.compile('|'.join([TEST, LEFT_BRACKET, RIGHT_BRACKET, AND, OR]))
s = "(test1 or (test2 or test3)) and (test4 and (test6)) and (test7 or test8) and test9"
scanner = master_pat.scanner(s)
scanner.match()
# <re.Match object; span=(0, 1), match='('>
_.lastgroup, _.group()
# ('LEFT_BRACKET', '(')
scanner.match()
# <re.Match object; span=(1, 6), match='test1'>
_.lastgroup, _.group()
# ('TEST', 'test1')
scanner.match()
# <re.Match object; span=(6, 10), match=' or '>
_.lastgroup, _.group()
# ('OR', ' or ')
scanner.match()
# <re.Match object; span=(10, 11), match='('>
_.lastgroup, _.group()
# ('LEFT_BRACKET', '(')
scanner.match()
# <re.Match object; span=(11, 16), match='test2'>
_.lastgroup, _.group()
# ('TEST', 'test2')