应用错误收集

这是我的代码，用于读取名为interact.csv的大文件（超过15 GiB），并对每行进行一些检查，并根据检查，将交互文件拆分为两个单独的文件：test.csv和train。 CSV。

我的机器需要两天以上才能停止。有没有什么方法可以使用某种并行性来提高代码的速度？

target_items: a list containing some item IDs

目前的计划：

with open(interactions) as interactionFile, open("train.csv", "wb") as train, open("test.csv", "wb") as test:
    header=interactionFile.next();
    train.write(header+'\n')
    test.write(header+'\n')
    i=0
    for row in interactionFile:
        # process each row
        l = row.split('\t')
        if l[1] in target_items:
            test.write(row+'\n')
        else:
            train.write(row+'\n')
        print(i)
        i+=1

Python更快地读写文件

0 个答案: