Question

我一直在研究这个文件i / o，并且通过网站阅读取得了一些进展，我想知道还有哪些方法可以优化。我正在解析一个10GB / 30MM行的测试文件，并在outfile中写入导致aprog 1.4GB清理文件的字段。最初，运行这个过程需要40米，我已经减少到大约30米。任何人都有任何其他想法来减少python中的这一点。从长远来看，我将用C ++编写这个 - 我只需要先学习这门语言。提前谢谢。

with open(fdir+"input.txt",'rb',(50*(1024*1024))) as r:
w=open(fdir+"output0.txt",'wb',50*(1024*1024)))
for i,l in enumerate(r):
    if l[42:44]=='25':
        # takes fixed width line into csv line of only a few cols
        wbun.append(','.join([
                                l[7:15],
                                l[26:35],
                                l[44:52],
                                l[53:57],
                                format(int(l[76:89])/100.0,'.02f'),
                                l[89:90],
                                format(int(l[90:103])/100.0,'.02f'),
                                l[193:201],
                                l[271:278]+'\n'
                            ]))
    # write about every 5MM lines
    if len(wbun)==wsize:
        w.writelines(wbun)
        wbun=[]
        print "i_count:",i
    # splits about every 4GB
    if (i+1)%fsplit==0:
        w.close()
        w=open(fdir+"output%d.txt"%(i/fsplit+1),'wb',50*(1024*1024)))
w.writelines(wbun)
w.close()

Answer 1

尝试在Pypy（https://pypy.org）中运行它，它将在不更改代码的情况下运行，并且可能更快。

此外，C ++可能是一种矫枉过正，特别是如果你还不知道的话。考虑学习Go或D。

Python文本文件读/写优化

1 个答案: