python,比较两个文件,并获得差异

时间:2015-02-16 15:34:30

标签: python

我有两个文件,一个是用户输入f1,另一个是数据库f2.I想要搜索来自f1的字符串是否在数据库中(f2)。如果不打印那些不存在的那些,如果f2。我的代码有问题,它工作不正常: 这是f1:

rbs003491
rbs003499
rbs003531
rbs003539
rbs111111

这是f2:

AHPTUR13,rbs003411 
AHPTUR13,rbs003419 
AHPTUR13,rbs003451 
AHPTUR13,rbs003459 
AHPTUR13,rbs003469 
AHPTUR13,rbs003471 
AHPTUR13,rbs003479 
AHPTUR13,rbs003491 
AHPTUR13,rbs003499 
AHPTUR13,rbs003531 
AHPTUR13,rbs003539 
AHPTUR13,rbs003541 
AHPTUR13,rbs003549 
AHPTUR13,rbs003581 

在这种情况下,它将返回rbs11111,因为它不在f2中。 代码是:

 with open(c,'r') as f1:
             s1 = set(x.strip() for x in f1)
             print s1
             with open("/tmp/ARNE/blt",'r') as f2:
                  for line in f2:
                      if line not in s1:
                          print line 

3 个答案:

答案 0 :(得分:1)

如果您只关心每一行的第二部分(rbs003411中的AHPTUR13,rbs003411):

with open(user_input_path) as f1, open('/tmp/ARNE/blt') as f2:
    not_found = set(f1.read().split())
    for line in f2:
        _, found = line.strip().split(',')
        not_found.discard(found)  # remove found word
    print not_found
    # for x in not_found:
    #     print x

答案 1 :(得分:0)

for循环中的line变量将包含" AHPTUR13,rbs003411"等内容,但您只对第二部分感兴趣。你应该做点什么:

for line in f2:
    line = line.strip().split(",")[1]
    if line not in s1:
        print line

答案 2 :(得分:0)

你需要检查线条的最后部分而不是所有线条,你可以用,从f2分割线条,然后选择最后一部分(x.strip().split(',')[-1]),如果你想搜索如果来自f1的字符串在数据库(f2)中,那么你的LOGIC就错了,你需要从f2创建你的集合:

with open(c,'r') as f1,open("/tmp/ARNE/blt",'r') as f2:

                  s1 = set(x.strip().split(',')[-1] for x in f2)
                  print s1
                  for line in f1:
                      if line.strip() not in s1:
                          print line