我正在尝试同时匹配下面的line1
和line2
的正则表达式,当前它仅匹配第1行,我如何使problem/
为可选,以便该正则表达式也匹配第2行? >
import re
line1 = '<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms'
line2 = '<change://51736404> [KIC] Not seeing NACK events from tech when packet ex'
match = re.findall("[\S]*(?:change:\/\/problem\/)(\d{8,8})", line1)
print match
match = re.findall("[\S]*(?:change:\/\/problem\/)(\d{8,8})", line2)
print match
答案 0 :(得分:3)
您可以通过添加与dplyr
匹配0到1次的量词?
来做到这一点:
problem/
请注意,您事先会贪婪地匹配所有非空格值。如果您的行总是以这种模式放在方括号中,请尝试以下方法:
[\S]*change:\/\/(?:problem\/)?\d{8}
答案 1 :(得分:1)
我猜测此表达式可能与我们所需的字符串匹配:
<change:\/\/.*?(\d{8})\s*>
re.findall
import re
regex = r"<change:\/\/.*?(\d{8})\s*>"
test_str = ("<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms\n"
"<change://51736404> [KIC] Not seeing NACK events from tech when packet ex\n"
"<change://problem/problem/problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms")
print(re.findall(regex, test_str))
re.finditer
import re
regex = r"<change:\/\/.*?(\d{8})\s*>"
test_str = ("<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms\n"
"<change://51736404> [KIC] Not seeing NACK events from tech when packet ex\n"
"<change://problem/problem/problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms")
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
在this demo的右上角对表达式进行了说明,如果您想探索/简化/修改它,在this link中,您可以观察它如何与某些示例输入步骤匹配一步一步,如果您喜欢。
jex.im可视化正则表达式:
答案 2 :(得分:0)
使用简单的模式来匹配<change://
,然后使用可选部分来匹配直到第一个/
和/
本身的任何文本,然后捕获任意一位或多位数字
match = re.search(r"<change://(?:[^/]*/)?(\d+)", line)
if match:
print(match.group(1))
注意:如果您有<change://more/problems/52547719>
这样的字符串,则可以使用一个小的变化形式:
match = re.search(r"<change://[^>]*?(\d+)>", line)
请参见this regex demo。
请参见Python demo:
import re
lines = ['<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms',
'<change://51736404> [KIC] Not seeing NACK events from tech when packet ex']
for line in lines:
match = re.search(r"<change://(?:[^/]*/)?(\d+)", line)
if match: # Check if matched or exception will be raised
print(match.group(1)) # .group(1) only prints Group 1 value
详细信息
<change://
-文字(?:[^/]*/)?
-可选序列:
[^/]*
-除/
之外的0个或更多字符/
-一个/
字符(\d+)
-第1组:一个或多个数字