Question

我有两个制表符分隔的大文件，

1)Bmag0905  chr7B   401656584   401656568
Bmag0905    chr7A   459876086   459876070
Bmag0904    chr2B   472060312   472060296
Bmag0904    chr2A   373596126   373596110
Bmag0904    chr7B   401656584   401656568

2)Bmag0905  chr7B   172039378   172039358
Bmag0905    chr4B   186310411   186310431
Bmag0904    chr4B   532339252   532339232
Bmag0904    chr2B   708832397   708832377
Bmag0904    chr3A   673781330   673781350

我想获得在第1行和第2行都很常见的元素。所以我的输出就是这样，

Bmag0905  chr7B   401656584   401656568 Bmag0905  chr7B   172039378   172039358
Bmag0904    chr2B   472060312   472060296 Bmag0904    chr2B   708832397   708832377

所以，这就是我创建字典的方法，现在问题是如何找到上面的常用元素并在新文件中打印所需的行？另外，我应该按元素[0]分组吗？

fileA = open("input1.txt",'r')
fileB = open("input2.txt",'r')
output = open("shared",'w')
dictA = {}
for line1 in fileA:
    new_list=line1.rstrip('\n').split('\t')
    query=new_list[0]
    subject=new_list[1]
    dictA.setdefault((query), []).append(subject)
dictB = {}
for line1 in fileB:
    new_list=line1.rstrip('\n').split('\t')
    query=new_list[0]
    subject=new_list[1]
    dictB.setdefault((query), []).append(subject)
Shared ={}
for id1, value1 in dictA.items(): (?)
   if id1 in dictB.keys():(?)

Answer 1

基于Utility.setListViewHeightBasedOnChildren(myListView);和csv的解决方案，使用前两列值对作为键。我从您的样本输入/输出中得出它的共性是基于前两列：

set

从两个文件中查找两行中的公共元素

1 个答案: