我想在两个正则表达式模式之间找到字符串。棘手的是,“ before pattern”的某些部分需要包含在输出字符串中。
这是我的代码的简化版本
import re
start_pattern = "( StartString1 | StartString2 | StartString3ShouldBeIncluded | StartString4ShouldBeIncluded )"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'{start_pattern}(?P<content>.*?){end_pattern}'
input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
output = re.search(joined_pattern, input1).group('content')
print(output) # Prints 'THECONTENT' which is what I want
input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
output = re.search(joined_pattern, input2).group('content')
print(output) # Prints 'THECONTENT' but I want 'StartString3ShouldBeIncluded THECONTENT'
有什么方法可以更改此正则表达式以获取所需的输出?
答案 0 :(得分:1)
您可以使应包含在其自己的命名组中的起始字符串,并在匹配后将两个命名组连接在一起。由于应包含的起始字符串可能不匹配并变成None
,因此在加入or
组之前,可以使用content
运算符将值默认为空字符串:< / p>
import re
start_pattern = "( StartString1 | StartString2 |(?P<start> StartString3ShouldBeIncluded | StartString4ShouldBeIncluded ))"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'{start_pattern}(?P<content>.*?){end_pattern}'
input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
match = re.search(joined_pattern, input1)
output = (match.group('start') or '') + match.group('content')
print(output) # Prints 'THECONTENT' which is what I want
input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
match = re.search(joined_pattern, input2)
output = (match.group('start') or '') + match.group('content')
print(output) # Prints 'StartString3ShouldBeIncluded THECONTENT'
答案 1 :(得分:0)
只需按以下步骤移动您的网上论坛名称的位置即可:
import re
start_pattern = "( StartString1 | StartString2 | StartString3ShouldBeIncluded | StartString4ShouldBeIncluded )"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'(?P<content>{start_pattern}.*?){end_pattern}'
input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
output = re.search(joined_pattern, input1).group('content')
print(output) # Prints 'THECONTENT' which is what I want
input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
output = re.search(joined_pattern, input2).group('content')
print(output) # Prints 'StartString3ShouldBeIncluded THECONTENT'
哪些印刷品:
StartString1 THECONTENT
StartString3ShouldBeIncluded THECONTENT