多线程读取csv文件python

时间:2014-02-04 13:59:23

标签: python multithreading csv

如何解析python中的大(约5GB)?

我当前的数据文件解析器花了很多时间

def parse_data(file, op_file_test):
        with open(file, 'rb') as inf:
                incsv = csv.reader(x.replace('\0', '') for x in inf)
                col1, col2 = defaultdict(Counter), defaultdict(Counter)
                for c1,c2,label in incsv:
                        try:
                                col1[label][c1] += 1
                                col2[label][c2] += 1
                        except ValueError:
                                pass

        map_file =  open(op_file_test,'a')
        labels = sorted(col1)
        for lbl in labels:
                print >>map_file, "%s, %s, %s, %s, %s" % (lbl, col1[lbl][str(0)], col1[lbl][str(1)], col2[lbl][str(0)],col2[lbl][str(1)])
        map_file.close()

最有效的方法是什么?

0 个答案:

没有答案