正则表达式匹配因<change:// 51736404>而失败

时间:2019-07-12 15:58:27

标签: python regex

我正在尝试同时匹配下面的line1line2的正则表达式,当前它仅匹配第1行,我如何使problem/为可选,以便该正则表达式也匹配第2行? >

import re
line1 = '<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms'

line2 = '<change://51736404> [KIC] Not seeing NACK events from tech when packet ex'
match = re.findall("[\S]*(?:change:\/\/problem\/)(\d{8,8})", line1)
print match
match = re.findall("[\S]*(?:change:\/\/problem\/)(\d{8,8})", line2)
print match

3 个答案:

答案 0 :(得分:3)

您可以通过添加与dplyr匹配0到1次的量词?来做到这一点:

problem/

请注意,您事先会贪婪地匹配所有非空格值。如果您的行总是以这种模式放在方括号中,请尝试以下方法:

[\S]*change:\/\/(?:problem\/)?\d{8}

答案 1 :(得分:1)

我猜测此表达式可能与我们所需的字符串匹配:

<change:\/\/.*?(\d{8})\s*>

使用re.findall

进行测试
import re

regex = r"<change:\/\/.*?(\d{8})\s*>"

test_str = ("<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms\n"
    "<change://51736404> [KIC] Not seeing NACK events from tech when packet ex\n"
    "<change://problem/problem/problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms")

print(re.findall(regex, test_str))

使用re.finditer

进行测试
import re

regex = r"<change:\/\/.*?(\d{8})\s*>"

test_str = ("<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms\n"
    "<change://51736404> [KIC] Not seeing NACK events from tech when packet ex\n"
    "<change://problem/problem/problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

this demo的右上角对表达式进行了说明,如果您想探索/简化/修改它,在this link中,您可以观察它如何与某些示例输入步骤匹配一步一步,如果您喜欢。

RegEx电路

jex.im可视化正则表达式:

enter image description here

答案 2 :(得分:0)

使用简单的模式来匹配<change://,然后使用可选部分来匹配直到第一个//本身的任何文本,然后捕获任意一位或多位数字

match = re.search(r"<change://(?:[^/]*/)?(\d+)", line)
if match:
    print(match.group(1))

注意:如果您有<change://more/problems/52547719>这样的字符串,则可以使用一个小的变化形式:

match = re.search(r"<change://[^>]*?(\d+)>", line)

请参见this regex demo

请参见Python demo

import re
lines = ['<change://problem/52547719> DEM: Increase granularity of the lower size bins in the packet burst size histograms',
         '<change://51736404> [KIC] Not seeing NACK events from tech when packet ex']
for line in lines:
    match = re.search(r"<change://(?:[^/]*/)?(\d+)", line)
    if match:                 # Check if matched or exception will be raised
        print(match.group(1)) # .group(1) only prints Group 1 value

请参见regex demoregex graph

enter image description here

详细信息

  • <change://-文字
  • (?:[^/]*/)?-可选序列:
    • [^/]*-除/之外的0个或更多字符
    • /-一个/字符
  • (\d+)-第1组:一个或多个数字