正则表达式选择性地包含分隔符

时间:2019-02-15 15:22:23

标签: python regex python-3.x

我想在两个正则表达式模式之间找到字符串。棘手的是,“ before pattern”的某些部分需要包含在输出字符串中。

这是我的代码的简化版本

import re
start_pattern = "( StartString1 | StartString2 | StartString3ShouldBeIncluded | StartString4ShouldBeIncluded )"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'{start_pattern}(?P<content>.*?){end_pattern}'

input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
output = re.search(joined_pattern, input1).group('content')
print(output)  # Prints 'THECONTENT' which is what I want

input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
output = re.search(joined_pattern, input2).group('content')
print(output)  # Prints 'THECONTENT' but I want 'StartString3ShouldBeIncluded THECONTENT'

有什么方法可以更改此正则表达式以获取所需的输出?

2 个答案:

答案 0 :(得分:1)

您可以使应包含在其自己的命名组中的起始字符串,并在匹配后将两个命名组连接在一起。由于应包含的起始字符串可能不匹配并变成None,因此在加入or组之前,可以使用content运算符将值默认为空字符串:< / p>

import re
start_pattern = "( StartString1 | StartString2 |(?P<start> StartString3ShouldBeIncluded | StartString4ShouldBeIncluded ))"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'{start_pattern}(?P<content>.*?){end_pattern}'

input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
match = re.search(joined_pattern, input1)
output = (match.group('start') or '') + match.group('content')
print(output)  # Prints 'THECONTENT' which is what I want

input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
match = re.search(joined_pattern, input2)
output = (match.group('start') or '') + match.group('content')
print(output)  # Prints 'StartString3ShouldBeIncluded THECONTENT'

答案 1 :(得分:0)

只需按以下步骤移动您的网上论坛名称的位置即可:

import re

start_pattern = "( StartString1 | StartString2 | StartString3ShouldBeIncluded | StartString4ShouldBeIncluded )"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'(?P<content>{start_pattern}.*?){end_pattern}'

input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
output = re.search(joined_pattern, input1).group('content')
print(output)  # Prints 'THECONTENT' which is what I want

input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
output = re.search(joined_pattern, input2).group('content')
print(output)  # Prints 'StartString3ShouldBeIncluded THECONTENT'                    

哪些印刷品:

 StartString1 THECONTENT
 StartString3ShouldBeIncluded THECONTENT