Question

我有以下文字：

ERROR: <C:\Includes\Library1.inc:123> This is the Error
Call Trace:
    <C:\Includes\Library2.inc:456>
    <C:\Includes\Library2.inc:789>
    <C:\Code\Main.ext:12> 
    <Line:1> 
ERROR: <C:\Includes\Library2.inc:2282> Another Error
Call Trace:
    <C:\Code\Main.ext:34>
    <C:\Code\Main.ext:56>
    <C:\Code\Main.ext:78>
    <Line:1> 
ERROR: <C:\Code\Main.ext:90> Error Three

我想提取以下信息：

line, Error = 12, This is the Error
line, Error = 34, Another Error
line, Error = 90, Error Three

这是我有多远：

theText = 'ERROR: ...'
ERROR_RE = re.compile(r'^ERROR: <(?P<path>.*):(?P<line>[0-9]+)> (?P<error>.*)$')
mainName = '\Main.ext'
# Go through each line
for fullline in theText.splitlines():
    match = self.ERROR_RE.match(fullline)
    if match:
        path, line, error = match.group('path'), match.group('line'), match.group('error')
        if path.endswith(mainName):
            callSomething(line, error)
        # else check next line for 'Call Trace:'
        # check next lines for mainName and get the linenumber
        # callSomething(linenumber, error)

循环中剩余元素的pythonic方法是什么？

解决方案： http://codepad.org/BcYmybin

Answer 1

关于如何循环剩余行的问题的直接答案是：将循环的第一行更改为

lines = theText.splitlines()
for (linenum, fullline) in enumerate(lines):

然后，在匹配之后，您可以通过查看内部循环中的lines[j]来查看剩余的行，其中j从linenum+1开始并一直运行到下一个匹配。

然而，解决问题的一种更为灵活的方法是首先将文本拆分为块。有很多方法可以做到这一点，但是，作为一个以前的perl用户，我的冲动是使用正则表达式。

# Split into blocks that start with /^ERROR/ and run until either the next
# /^ERROR/ or until the end of the string.
#
# (?m)      - lets '^' and '$' match the beginning/end of each line
# (?s)      - lets '.' match newlines
# ^ERROR    - triggers the beginning of the match
# .*?       - grab characters in a non-greedy way, stopping when the following
#             expression matches
# (?=^ERROR|$(?!\n)) - match until the next /^ERROR/ or the end of string
# $(?!\n)   - match end of string.  Normally '$' suffices but since we turned
#             on multiline mode with '(?m)' we have to use '(?!\n)$ to prevent
#             this from matching end-of-line.
blocks = re.findall('(?ms)^ERROR.*?(?=^ERROR|$(?!\n))', theText)

Answer 2

替换它：

        # else check next line for 'Call Trace:'
        # check next lines for mainName and get the linenumber
        # callSomething(linenumber, error)

有了这个：

    match = stackframe_re.match(fullline)
    if match and error: # if error is defined from earlier when you matched ERROR_RE
        path, line = match.group('path'), match.group('line')
        if path.endsWith(mainName):
            callSomething(line, error)
            error = None # don't report this error again if you see main again

请注意缩进。在循环开始之前初始化error = None，并在第一次调用error = None后设置callSomething。一般来说，我建议的代码应该适用于格式正确的数据，但您可能希望对其进行改进，以便在数据与您期望的格式不匹配时不会产生误导性结果。

您必须编写stackframe_re，但它应该是匹配的RE，例如

    <C:\Includes\Library2.inc:789>

当你说“循环循环中的剩余元素”时，我真的不明白你的意思。默认情况下，循环继续到其余元素。

循环中的其余元素

2 个答案: