如何在Python中逐行读取文件(或stdin)而不是等待读取整个文件

时间:2011-10-17 09:07:47

标签: python filter

我们在磁盘中有几个巨大的文件(大于RAM的大小)。我想在python中逐行读取它们并在终端输出结果。我已经完成了[1]和[2],但我正在寻找不等到整个文件被读入内存的方法。

我会使用这两个命令:

cat fileName | python myScript1.py
python myScript2.py fileName

[1] How do you read from stdin in Python? [2] How do I write a unix filter in python?

3 个答案:

答案 0 :(得分:8)

这是Python中的standard behavior of file objects

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        # do something with the current line

for line in sys.stdin:
    # do something with the current line

答案 1 :(得分:4)

迭代file

with open('huge.file') as hf:
  for line in hf:
    if 'important' in line:
      print(line)

这将需要O(1)内存。

要从标准输入读取,只需迭代sys.stdin而不是hf

import sys
for line in sys.stdin:
  if 'important' in line:
    print(line)

答案 2 :(得分:-1)

if __name__ == '__main__':
    while 1:
        try:
            a=raw_input()
        except EOFError:
            break
        print a

这将从stdin到EOF读取。 要使用第二种方法读取文件,可以使用Tim的方法

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        print line
        # do something with the current line