使用python在生成器中分隔奇数和偶数行

时间:2013-08-04 20:40:26

标签: python data-manipulation

我有一个巨大的数据文件(~2 G),需要分成奇数行和偶数行,单独处理并写入两个文件,我不想将整个文件读入RAM,所以我想一个发电机应该是一个合适的选择。总之,我想做这样的事情:

lines = (l.strip() for l in open(inputfn))
oddlines = somefunction(getodds(lines))
evenlines = somefunction(getevens(lines))
outodds.write(oddlines)
outevens.write(evenlines)

这可能吗?显然索引不起作用:

In [75]: lines[::2]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/kaiyin/Phased/build37/chr22/segments/segment_1/<ipython-input-75-97be680d00e3> in <module>()
----> 1 lines[::2]

TypeError: 'generator' object is not subscriptable

3 个答案:

答案 0 :(得分:2)

def oddlines(fileobj):
    return (line for index,line in enumerate(fileobj) if index % 2)

def evenlines(fileobj):
    return (line for index,line in enumerate(fileobj) if not index % 2)

请注意,这需要扫描文件两次,因为这些文件并非设计为并行运行。但是,它确实会导致代码复杂得多。 (另请注意,此处的'奇数'行是索引为1,3,5的行 - 这意味着由于零索引,第一行是'偶数'行。)

正如Ashwini所说,您也可以使用itertools.islice来执行此操作。

答案 1 :(得分:2)

使用itertools.islice切片迭代器:

from itertools import islice
with open('filename') as f1, open('evens.txt', 'w') as f2:
    for line in islice(f1, 0, None, 2):
        f2.write(line)

with open('filename') as f1, open('odds.txt', 'w') as f2:
    for line in islice(f1, 1, None, 2):
        f2.write(line)

答案 2 :(得分:0)

如果您只想读取一次文件,请编写一个包装file的生成器,并返回一个标志,指示该行是偶数还是奇数以及从文件读取的实际行。

def oddeven(f, even=True):
    for line in f:
        yield even, line
        even = not even

用法:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         for even, line in oddeven(infile):
            if even:
                evenfile.write(line)
            else:
                oddfile.write(line)

可以通过将输出文件对象存储在可索引容器中来进一步简化:

with open("infile.txt") as infile, \
     open("odd.txt", "w") as oddfile, \
     open ("even.txt", "w") as evenfile:
         outfiles = (oddfile, evenfile)
         for even, line in oddeven(infile):
             outfiles[even].write(line)