我们在磁盘中有几个巨大的文件(大于RAM的大小)。我想在python中逐行读取它们并在终端输出结果。我已经完成了[1]和[2],但我正在寻找不等到整个文件被读入内存的方法。
我会使用这两个命令:
cat fileName | python myScript1.py
python myScript2.py fileName
[1] How do you read from stdin in Python? [2] How do I write a unix filter in python?
答案 0 :(得分:8)
这是Python中的standard behavior of file objects:
with open("myfile.txt", "r") as myfile:
for line in myfile:
# do something with the current line
或
for line in sys.stdin:
# do something with the current line
答案 1 :(得分:4)
迭代file:
with open('huge.file') as hf:
for line in hf:
if 'important' in line:
print(line)
这将需要O(1)内存。
要从标准输入读取,只需迭代sys.stdin
而不是hf
:
import sys
for line in sys.stdin:
if 'important' in line:
print(line)
答案 2 :(得分:-1)
if __name__ == '__main__':
while 1:
try:
a=raw_input()
except EOFError:
break
print a
这将从stdin到EOF读取。 要使用第二种方法读取文件,可以使用Tim的方法
即
with open("myfile.txt", "r") as myfile:
for line in myfile:
print line
# do something with the current line