正则表达式匹配所有分隔符

时间:2015-07-12 13:13:48

标签: python regex

我想要的效果:如果在x之前找不到y,则会失败。

import re

a = '''START aaaadkdklfje VALUE aaaadkdklfjeaaaadkdklfjeaaaadkdklfje aaaadkdklfjeaaaadkdklfjeaaaadkdklfjeaaaadkdklfjeaaaadkdklfjeaaaadkdklfje aaaadkdklfjeaaaadkdklfje          aaaadkdklfje
aaaadkdklfje
aaaadkdklfje condition a
aaaadkdklfje
aaaadkdklfje
aaaadkdklfje condition b
                          aaaadkdklfje z
                          aaaadkdklfjeaaaadkdklfje        aaaadkdklfjeqqqsdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddfsdfsdf 
condition c

???kjij
START...'''

b = re.findall(r'START condition a (VALUE).+?condition b.+?condition c(?!START)', a, re.DOTALL)
if b:
    for x in b:
        print x

我想仅在文本块中存在value时才捕获condition。没有匹配过去的下一个start

这是唯一应匹配的案例:

start
?, value, ?, condition a, ?, condition b, ?, condition c # i want the matching to be done only in here
start
...

不是这个:

start
?, value, condition a, ?
start
?, value, ?, condition b, condition c
start

2 个答案:

答案 0 :(得分:2)

另一种方法是使用几个步骤:

  • 你用“START”分割字符串以获得一个块列表
  • 你过滤了没有条件的块
  • 你插入了&# 39; START'在每个项目之前。
blocks = re.split(r'\bSTART\b', s)
blocks = filter(lambda x: re.search(r'condition a.*?condition b.*?condition c', x), blocks[1:])
blocks = map(lambda x: 'START'+x, blocks)

注意:如果您希望条件位于关键字VALUE之后,请在搜索模式的开头添加\bVALUE\b.*?

答案 1 :(得分:1)

您可以合并多个lookarounds,以便不跳过START并维护条件序列:

(?s)START(?:(?!START|condition).)*?\b(VALUE)(?=(?:(?!START).)*?condition a(?:(?!START).)*?condition b(?:(?!START).)*?condition c)

Test at regex101但请注意,这是非常糟糕的表现:]

这允许condition a condition a condition b condition c。要创建独家条件,请将condition a(?:(?!START).)*?和b c部分更改为condition a(?:(?!START|condition).)*? ...