我有一个巨大的数据文件(~2 G),需要分成奇数行和偶数行,单独处理并写入两个文件,我不想将整个文件读入RAM,所以我想一个发电机应该是一个合适的选择。总之,我想做这样的事情:
lines = (l.strip() for l in open(inputfn))
oddlines = somefunction(getodds(lines))
evenlines = somefunction(getevens(lines))
outodds.write(oddlines)
outevens.write(evenlines)
这可能吗?显然索引不起作用:
In [75]: lines[::2]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/kaiyin/Phased/build37/chr22/segments/segment_1/<ipython-input-75-97be680d00e3> in <module>()
----> 1 lines[::2]
TypeError: 'generator' object is not subscriptable
答案 0 :(得分:2)
def oddlines(fileobj):
return (line for index,line in enumerate(fileobj) if index % 2)
def evenlines(fileobj):
return (line for index,line in enumerate(fileobj) if not index % 2)
请注意,这需要扫描文件两次,因为这些文件并非设计为并行运行。但是,它确实会导致代码复杂得多。 (另请注意,此处的'奇数'行是索引为1,3,5的行 - 这意味着由于零索引,第一行是'偶数'行。)
正如Ashwini所说,您也可以使用itertools.islice
来执行此操作。
答案 1 :(得分:2)
使用itertools.islice
切片迭代器:
from itertools import islice
with open('filename') as f1, open('evens.txt', 'w') as f2:
for line in islice(f1, 0, None, 2):
f2.write(line)
with open('filename') as f1, open('odds.txt', 'w') as f2:
for line in islice(f1, 1, None, 2):
f2.write(line)
答案 2 :(得分:0)
如果您只想读取一次文件,请编写一个包装file
的生成器,并返回一个标志,指示该行是偶数还是奇数以及从文件读取的实际行。
def oddeven(f, even=True):
for line in f:
yield even, line
even = not even
用法:
with open("infile.txt") as infile, \
open("odd.txt", "w") as oddfile, \
open ("even.txt", "w") as evenfile:
for even, line in oddeven(infile):
if even:
evenfile.write(line)
else:
oddfile.write(line)
可以通过将输出文件对象存储在可索引容器中来进一步简化:
with open("infile.txt") as infile, \
open("odd.txt", "w") as oddfile, \
open ("even.txt", "w") as evenfile:
outfiles = (oddfile, evenfile)
for even, line in oddeven(infile):
outfiles[even].write(line)