Question

大家好，抱歉我的英文不好，

我尝试比较python中的两个大文本文件，它们的大小约为18 GB，它们包含使用itertools.combinations获得的不同数据库值的多种组合，一种是行的组合。

file1 ex：

('1', '7')
('1', '3')
('1', '4')

文件2 ex：

('1', '4')
('1', '5')
('1', '7')

我需要：

1）在文本文件上打印或写入这些文件中未共享的组合。

2）不将整个txt文件加载到RAM中。

这实际上有效：

with open('c:/python27/file1.txt', 'r') as f:
    while True:
        next_n_lines = list(itertools.islice(f, 1))
        if not next_n_lines:
            break
        with open('C:/python27/file2.txt', 'r') as f2:
            for line in next_n_lines:
                if not line in f2:
                    print line

这段代码真的是唯一有用的东西，我复制了很多其他代码，这些代码在stackoverflow上找到了，但是很慢（我计算结束工作需要超过一年的时间，生活也是如此等待这段时间。）

如果我尝试使用：

with open('c:/python27/file1.txt', 'r') as f:
    with open('C:/python27/file2.txt', 'r') as f2:
        for line in f:
            if not line in f2:
                print line

它只打印file1的整个内容。

您是否有任何建议可以更快或以任何其他方式达到我的目标？

由于

PS我不是编码员，我从来没有编码任何东西

在python中比较两个大文本文件的最有效方法？

0 个答案: