这段代码逐行读取一个大文件,处理每一行,然后在没有新条目时结束该过程:
file = open(logFile.txt', 'r')
count = 0
while 1:
where = file.tell()
line = file.readline()
if not line:
count = count + 1
if count >= 10:
break
time.sleep(1)
file.seek(where)
else:
#process line
在我的经历中,逐行阅读需要很长时间,因此我尝试改进此代码以每次读取大量行:
from itertools import islice
N = 100000
with open('logFile.txt', 'r') as file:
while True:
where = file.tell()
next_n_lines = list(islice(file, N)).__iter__()
if not next_n_lines:
count = count + 1
if count >= 10:
break
time.sleep(1)
file.seek(where)
for line in next_n_lines:
# process next_n_lines
除了结尾部分之外,它工作正常,即使文件中没有更多行,它也不会结束进程(打破while循环)。有什么建议吗?
答案 0 :(得分:3)
原始代码已经一次读取大块文件,它一次只返回一行数据。您刚刚添加了一个冗余生成器,它使用文件对象的读取行功能一次获取10行。
除了少数例外,迭代文件中的行的最佳方法如下。
with open('filename.txt') as f:
for line in f:
...
如果您需要在时间上迭代预设数量的行,请尝试以下操作:
from itertools import islice, chain
def to_chunks(iterable, chunksize):
it = iter(iterable)
while True:
first = next(it)
# Above raises StopIteration if no items left, causing generator
# to exit gracefully.
rest = islice(it, chunksize-1)
yield chain((first,), rest)
with open('filename.txt') as f:
for chunk in to_chunks(f, 10):
for line in chunk:
...