查找双向匹配ID

时间:2017-01-25 15:43:00

标签: python

我的输入文件

ID1 ID2 value
ID3 ID6 value  
ID2 ID1 value
ID4 ID5 value
ID6 ID5 value
ID5 ID4 value
ID7 ID2 value

所需的输出,file1.txt

ID1 ID2 value   ID2 ID1 value
ID4 ID5 value   ID5 ID4 value

FILE2.TXT

ID3 ID6 value   
ID6 ID5 value
ID7 ID2 value

我正在尝试获得双重最佳匹配。如果ID1具有命中ID2,ID2也具有命中ID1,则在file1中打印,否则在file2中打印。我试图做的是创建一个输入文件的副本并创建一个字典。但这给出了没有值的输出(10列)。如何修改?

fileA = open("input.txt",'r')
fileB = open("input_copy.txt",'r')
output = open("out.txt",'w')

dictA = dict()
for line1 in fileA:
    new_list=line1.rstrip('\n').split('\t')
    query=new_list[0]
    subject=new_list[1]
    dictA[query] = subject
dictB = dict()
for line1 in fileB:
    new_list=line1.rstrip('\n').split('\t')
    query=new_list[0]
    subject=new_list[1]
    dictB[query] = subject
SharedPairs ={}
NotSharedPairs ={}
for id1 in dictA.keys():
    value1=dictA[id1]
    if value1 in dictB.keys():
        if id1 == dictB[value1]:
            SharedPairs[value1] = id1
        else:
            NotSharedPairs[value1] = id1
for key in SharedPairs.keys():
    ine = key +'\t' + SharedPairs[key]+'\n'
    output.write(line)
for key in NotSharedPairs.keys():
    line = key +'\t' + NotSharedPairs[key]+'\n'
    output2.write(line)

2 个答案:

答案 0 :(得分:1)

您可以使用set轻松解决问题:

#!/usr/bin/env python

# ordered pairs (ID1, ID2)
oset = set()
# reversed pairs (ID2, ID1)
rset = set()

with open('input.txt') as f:
    for line in f:
        first, second, val = line.strip().split()
        if first < second:
            oset.add((first, second, val,))
        else:
            # note that this reverses second and first for matching purposes
            rset.add((second, first, val,))

print "common: %s" % str(oset & rset)
print "diff: %s" % str(oset ^ rset)

输出:

common: set([('ID4', 'ID5', 'value'), ('ID1', 'ID2', 'value')])
diff: set([('ID3', 'ID6', 'value'), ('ID5', 'ID6', 'value'), ('ID2', 'ID7', 'value')])

它不会处理(ID1, ID1)对,但你可以将它添加到第三组并按照你的决定做。

答案 1 :(得分:1)

import csv
data = csv.reader(open('data.tsv'), delimiter='\t')
id_list = []
for item in data:
    (x, y, val) = item
    id_list.append((x, y, val))

file1 = [item for item in id_list if (item[1], item[0], item[2]) in id_list]
file2 = [item for item in id_list if (item[1], item[0], item[2]) not in id_list]
print file1
print file2