代码

Question

我想匹配以下输入。如何在不使用多行字符串的情况下将组匹配一定次数？类似于（^（\ d +）（。+）$）{3}）（但这不起作用）。

sample_string = """Breakpoint 12 reached 
         90  good morning
     91  this is cool
     92  this is bananas
     """
pattern_for_continue = re.compile("""Breakpoint \s (\d+) \s reached \s (.+)$
                                 ^(\d+)\s+  (.+)\n
                                 ^(\d+)\s+  (.+)\n
                                 ^(\d+)\s+  (.+)\n
                                  """, re.M|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)
    print matchobj.group(0)

Answer 1

您的表达和示例存在一系列问题：

使用VERBOSE会使所有空格不匹配，因此第一行数字周围的空格也会被忽略。将空格替换为\s或[ ]（后者仅匹配文字空间，前者也匹配换行符和制表符。）
您的输入样本在每行数字前面都有空格，但您的示例模式要求数字位于行的开头。允许该空格或修复您的样本输入。
最大的问题是重复组内的捕获组（最后一组中(\d+)内的{3}）仅捕获最后一个匹配。您将获得92和this is bananas，而不是前两个匹配的行。

为了克服这一切，你有明确重复三行的模式。您可以使用Python来实现重复：

linepattern =  r'[ ]* (\d+) [ ]+ ([^\n]+)\n'

pattern_for_continue = re.compile(r"""
    Breakpoint [ ]+ (\d+) [ ]+ reached [ ]+ ([^\n]*?)\n
    {}
""".format(linepattern * 3), re.MULTILINE|re.VERBOSE)

对于您的样本输入，返回：

>>> pattern_for_continue.match(sample_string).groups()
('12', '', '90', 'hey this is a great line', '91', 'this is cool too', '92', 'this is bananas')

如果您真的不想在3个额外行的数字之前匹配空格，则可以从[ ]*删除第一个linepattern模式。

Answer 2

代码

你需要更像这样的东西：

import re

sample_string = """Breakpoint 12 reached 
90  hey this is a great line
91  this is cool too
92  this is bananas
"""
pattern_for_continue = re.compile(r"""
    Breakpoint\s+(\d+)\s+reached\s+\n
    (\d+)  ([^\n]+?)\n
    (\d+)  ([^\n]+?)\n
    (\d+)  ([^\n]+?)\n
""", re.MULTILINE|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)

for i in range(1, 8):
    print i, matchobj.group(i)
print "Entire match:"
print matchobj.group(0)

结果

1 12
2 90
3   hey this is a great line
4 91
5   this is cool too
6 92
7   this is bananas
Entire match:
0 Breakpoint 12 reached 
90  hey this is a great line
91  this is cool too
92  this is bananas

的原因

re.VERBOSE在你的正则表达式中提供了必要的显式空格。我通过在多行字符串中左对齐数据来部分修复此问题。我认为这是合理的，因为你可能没有真正的代码;它可能是多行字符串中的编辑工具。
您需要将$替换为\n。
你需要非贪婪的比赛

正则表达式：恰好匹配三行

2 个答案:

代码

结果

的原因