Python部分比较两个文件

时间:2014-06-21 07:20:39

标签: python

我有两个输入文件:

输入1:

好的句子
两条跑道
三条跑道
右跑道
一条路径

四通道
零路径

输入2:

好的句子
两条跑道
三条跑道
右跑道
零路径

一条路 四通道

我使用了以下代码:

def diff(a, b):
y = []
for x in a:
    if x not in b:
        y.append(x)
    else:
        b.remove(x)
return y

with open('output_ref.txt', 'r') as file1:
   with open('output_ref1.txt', 'r') as file2:
    same = diff(list(file1), list(file2))
    print same
    print "\n"

if '\n' in same:
  same.remove('\n')

with open('some_output_file.txt', 'w') as FO:
  for line in same:
    FO.write(line)

预期的输出是:

一条路

零路径

但输出我得到一个空输出。问题是我不知道如何将文件中的内容部分存储到列表中,然后比较并最终从那里读回来。有人可以在这方面帮助我吗?

2 个答案:

答案 0 :(得分:0)

似乎如果你只想在两个文件中都有公共文本行,那么集合将提供一个好方法。像这样:

content1 = set(open("file1", "r"))
content2 = set(open("file2", "r"))
diff_items = content1.difference(content2)

更新:但问题是关于与diff实用程序相同意义上的差异吗?即顺序很重要(看起来与示例一样)。

答案 1 :(得分:0)

使用sets

with open('output_ref.txt', 'r') as file1:
    with open('output_ref1.txt', 'r') as file2:
        f1 = [x.strip() for x in file1] # get all lines and strip whitespace
        f2 = [x.strip() for x in file2]
        five_f1 = f1[0:5] # first five lines
        two_f1 = f1[5:] # rest of lines
        five_f2 = f2[0:5]
        two_f2 = f2[5:]
        s1 = set(five_f1)  # make sets to compare
        s2 = set(two_f1)
        s1 = s1.difference(five_f2) # in a but not b
        s2 = s2.difference(two_f2) 
        same = s1.union(s2)  


with open('some_output_file.txt', 'w') as FO:
    for line in same:
        FO.write(line+"\n") # add new line to write each word on separate line

使用您自己的方法没有集合:

with open('output_ref.txt', 'r') as file1:
    with open('output_ref1.txt', 'r') as file2:
        f1 = [x.strip() for x in file1]
        f2 = [x.strip() for x in file2]
        five_f1 = f1[0:5]
        two_f1 = f1[5:]
        five_f2 = f2[0:5]
        two_f2 = f2[5:]
        same = diff(five_f1,five_f2) + diff(two_f1,two_f2)
        print same
['one pathway', 'zero pathway']