Question

是否有办法在处理这些行的同时从文件中读取行。因此，阅读和处理将彼此分开进行。每当读取数据时，它都会提供处理，因此无论处理速度有多快，都始终在进行读取。

Answer 1

这取决于你的意思＆＃34;同时＆＃34;。让我们假设你不一定想要在多个线程，绿色线程或基于事件的代码的兔子洞中走下去，你只想干净地分开读取线路，过滤/处理这些线路和在实际业务逻辑中使用这些行。

使用迭代器和生成器（后者是一种特殊的迭代）很容易实现。从file调用返回的open()对象本身可用作迭代器，这使得这更容易。

考虑生成器表达式的简单链接（当然，这是一种可迭代的）预过滤读取行：

f = open('file-with-myriads-of-lines.txt', 'r')

# strip away trailing whitespace (including the newline)
lines_stripped = (line.rstrip() for line in f)

# remove trailing "#" comments (note: ignores potential quoting)
lines_without_comment = (line.partition('#')[0] for line in lines_stripped)

# remove remaining surrounding whitespace
lines_cleaned = (line.strip() for line in lines_without_comment)

# filter out (now) empty lines
lines_with_content = (line for line in lines_cleaned if line)

for line in lines_with_content:
    # your business logic goes here
    print("Line: {}".format(line))

虽然您可以将一些过滤/修改合并到一个生成器表达式中或将其放在for循环中，这样任务就可以完全分离，您可以通过重新排序，删除或添加更多生成器来轻松混合和匹配链

这也只是按需读取和处理每一行，只要在业务逻辑for循环中消耗一行（也可能隐藏在其他地方的单独函数中）。它不会预先读取所有行，而也不会创建包含所有中间结果的中间列表。这与列表推导形成对比，列表推理使用方括号而不是括号编写。

当然，您也可以以函数的形式为每个处理单元命名，以提高可读性，封装和可维护性：

def strip_trailing_whitespace(iterable):
    return (line.rstrip() for line in iterable)

def remove_trailing_comments(iterable):
    return (line.partition('#')[0] for line in iterable)

# ...


def preprocess_lines(iterable):

    iterable = strip_trailing_whitespace(iterable)
    iterable = remove_trailing_comments(iterable)
    # ...

    return iterable


def business_logic(iterable):
    for line in iterable:
        # your business logic here
        print("Line: {}".format(line))


def main():
    with open('file-with-myriads-of-lines.txt', 'r') as f:
        iterable = preprocess_lines(f)
        business_logic(iterable)


if __name__ == '__main__':
    main()

如果每行的预处理比生成器表达式中的可用内容更复杂，您只需使用yield语句或表达式将其扩展为自定义生成器函数：

def remove_trailing_comments(iterable):
    """Remove #-comments that are outside of double-quoted parts."""

    for line in iterable:
        pos = -1
        while True:
            pos = line.find('#', pos + 1)
            if pos < 0:
                break    # use whole line

            if line[:pos].count('"') % 2 == 0:
                # strip starting from first "#" that's not inside quotes
                line = line[:pos]
                break

        yield line

其他一切都保持不变。

同时从文件和处理行逐行读取？

1 个答案: