我希望匹配最后一个组,该组包含在[]
中,但在嵌套结构中可能包含其中一个[]
。
虽然不是很优雅,但我设法使用[]
的{{1}}进行嵌套regex
匹配。当存在多个此类匹配时,此解决方案适用于某些情况(例如python
)但不适用s1
或s2
。我的解决方案只匹配第一个。
有什么建议吗?更好的正则表达式?或正则表达不是要走的路?非常感谢!
s3
非常感谢你的帮助,如果我有15个代表,我将全部投票。但是,抱歉不包括预期的结果,应该是:
In [116]:
s1 = 'AAA [BBB [CCC]]'
s2 = 'AAA [DDD] [EEE]'
s3 = 'AAA [BBB [CCC]] [EEE]'
for s in [s1, s2, s3]:
result = regex.search(r'(?<rec>\[(?:[^\[\]]++|(?&rec))*\])',s,flags=regex.VERBOSE)
print(result.captures('rec'))
['[CCC]', '[BBB [CCC]]'] #I know it is perfect, but I can take the last one in the list
['[DDD]'] #This is the first one, I want the last one, which is [EEE]
['[CCC]', '[BBB [CCC]]'] #same problem as above
答案 0 :(得分:3)
在Python中,要使用递归或重复的子程序,我们需要使用Matthew Barnett的杰出regex模块......而且,正如@CTZhu指出的那样,你已经在使用它了!
要明确条款,可以有几种对“嵌套”的理解,例如:
[C[D[E]F]]
,它是...的一个子集[B[C] [D] [E[F][G]]]
。你需要能够处理后者,这个简短的正则表达式为我们做到了:
\[(?:[^[\]]++|(?R))*\]
这将匹配所有嵌套大括号。现在我们需要做的就是打印最后一场比赛。
以下是一些经过测试的Python代码:
import regex # say "yeah!" for Matthew Barnett
pattern = r'\[(?:[^[\]]++|(?R))*\]'
myregex = regex.compile(pattern)
# this outputs [EEE]
matches = myregex.findall('AAA [BBB [CCC]] [EEE]')
print (matches[-1])
# this outputs [C[D[E]F]] (simple nesting)
matches = myregex.findall('AAA [BBB] [C[D[E]F]]')
print (matches[-1])
# this outputs [B[C] [D] [E[F][G]]] (family-style nesting)
matches = myregex.findall('AAA [AAA] [B[]B[B]] [B[C] [D] [E[F][G]]]')
print (matches[-1])
答案 1 :(得分:2)
您可以使用此递归正则表达式,只打印最后一个匹配项:
s1 = 'AAA [BBB [CCC]]'
s2 = 'AAA [DDD] [EEE]'
s3 = 'AAA [BBB [CCC]] [EEE]'
import regex
for e in (s1, s2, s3):
matches=regex.findall(r'[^\[\]\s]+ | \[ (?: (?R) | [^\[\]]+ )+\]', e, regex.VERBOSE)
print(e, '=>', matches, '=>', matches[-1])
打印:
AAA [BBB [CCC]] => ['AAA', '[BBB [CCC]]'] => [BBB [CCC]]
AAA [DDD] [EEE] => ['AAA', '[DDD]', '[EEE]'] => [EEE]
AAA [BBB [CCC]] [EEE] => ['AAA', '[BBB [CCC]]', '[EEE]'] => [EEE]
答案 2 :(得分:1)
关闭给定数据,并说明您想要最后组,我将为您提供此递归正则表达式。
import regex
s1 = 'AAA [BBB [CCC]]'
s2 = 'AAA [DDD] [EEE]'
s3 = 'AAA [BBB [CCC]] [EEE]'
for s in [s1, s2, s3]:
result = regex.findall(r'\[(?:[^[\]]|(?R))*\]', s)
print result[-1]
输出
[BBB [CCC]]
[EEE]
[EEE]