Question

这段代码逐行读取一个大文件，处理每一行，然后在没有新条目时结束该过程：

file = open(logFile.txt', 'r')
count = 0

     while 1:
         where = file.tell()
         line = file.readline()
         if not line:
             count = count + 1
             if count >= 10:
               break
             time.sleep(1)
             file.seek(where)
         else:
            #process line

在我的经历中，逐行阅读需要很长时间，因此我尝试改进此代码以每次读取大量行：

from itertools import islice
N = 100000
with open('logFile.txt', 'r') as file:
    while True:
       where = file.tell()
       next_n_lines = list(islice(file, N)).__iter__()
       if not next_n_lines:
          count = count + 1
          if count >= 10:
             break
          time.sleep(1)
          file.seek(where)
       for line in next_n_lines:
        # process next_n_lines

除了结尾部分之外，它工作正常，即使文件中没有更多行，它也不会结束进程（打破while循环）。有什么建议吗？

Answer 1

原始代码已经一次读取大块文件，它一次只返回一行数据。您刚刚添加了一个冗余生成器，它使用文件对象的读取行功能一次获取10行。

除了少数例外，迭代文件中的行的最佳方法如下。

with open('filename.txt') as f:
    for line in f:
        ...

如果您需要在时间上迭代预设数量的行，请尝试以下操作：

from itertools import islice, chain

def to_chunks(iterable, chunksize):
    it = iter(iterable)
    while True:
        first = next(it)
        # Above raises StopIteration if no items left, causing generator
        # to exit gracefully.
        rest = islice(it, chunksize-1)
        yield chain((first,), rest)


with open('filename.txt') as f:
    for chunk in to_chunks(f, 10):
        for line in chunk:
            ...

Python - 如果没有行，则读取块并中断

1 个答案: