Question

我使用python处理日志文件。假设我有一个日志文件，其中包含一行START和一行END，如下所示：

START
one line
two line
...
n line
END

我想要的是能够在START和END行之间存储内容以进行进一步处理。

我在Python中执行以下操作：

with open (file) as name_of_file:
    for line in name_of_file:
        if 'START' in line:  # We found the start_delimiter
            print(line)
            found_start = True
            for line in name_of_file:  # We now read until the end delimiter
                if 'END' in line:  # We exit here as we have the info
                    found_end=True
                    break
                else:

                    if not (line.isspace()): # We do not want to add to the data empty strings, so we ensure the line is not empty
                        data.append(line.replace(',','').strip().split())  # We store information in a list called data we do not want ','' or spaces
if(found_start and found_end):
    relevant_data=data

然后我处理relevant_data。

对于Python的纯净度而言，它看起来非常复杂，因此我的问题是：是否有更Python化的方式来做到这一点？

谢谢！

Answer 1

您是对的，在同一迭代器上有嵌套循环是不对的。文件对象已经是迭代器，您可以利用它来发挥自己的优势。例如，要查找其中包含START的第一行：

line = next(l for l in name_of_file if 'START' in l)

如果没有这样的行，这将引发StopIteration。还将文件指针设置为您关心的第一行的开头。

获取最后一行没有任何后续内容会更加复杂，因为很难在生成器表达式中设置外部状态。相反，您可以创建一个简单的生成器：

def interesting_lines(file):
    if not next((line for line in file if 'START' in line), None):
        return
    for line in file:
        if 'END' in line:
            break
        line = line.strip()
        if not line:
            continue
        yield line.replace(',', '').split()

如果您没有START，则生成器不会产生任何结果，但是如果没有END，它将生成所有行，直到最后，因此它与您的实现有所不同。您将使用生成器完全替换循环：

with open(name_of_file) as file:
    data = list(interesting_lines(file))

if data:
    ... # process data

将生成器包装在list中会立即对其进行处理，因此即使关闭文件后这些行仍然存在。该迭代器可以重复使用，因为在调用结束时，文件指针将刚经过END行：

with open(name_of_file) as file:
    for data in iter(lambda: list(interesting_lines(file)), []):
        # Process another data set.

iter这种鲜为人知的形式将任何不接受参数的可调用对象转换为迭代器。当可调用对象返回前哨值（在这种情况下为空列表）时，到达末尾。

Answer 2

要执行此操作，您可以使用this post中讨论的iter(callable, sentinel)，直到达到前哨值为止，在您的情况下为“ END”（应用.strip()后）。 / p>

with open(filename) as file:
    start_token = next(l for l in file if l.strip()=='START') # Used to read until the start token
    result = [line.replace(',', '').split() for line in iter(lambda x=file: next(x).strip(), 'END') if line]

Answer 3

这是正则表达式re的任务，例如：

import re
lines = """ not this line
START
this line
this line too
END
not this one
"""
search_obj = re.search( r'START(.*)END', lines, re.S)
search_obj.groups(1)
# ('\n    this line\n    this line too\n    ',)

re.S对于跨越多行是必需的。

在两个先前已知的字符串之间处理文件的Python方式

3 个答案: