Question

我生成了两个词典：

dict1 = {'Ex1': ['Spata1', 'D', 'E'], 'Ex2': ['Fgg', 'Wfdc2', 'F', 'G']}

dict2={'lnc3': ['Spata1', 'Fgg', 'D'], 'lnc2': ['Fgg', 'E'], 'lnc1': ['Spata1', 'Wfdc2', 'F', 'G']}

我想计算dict1的每个键与dict2中每个键的每个值重叠的值，并在输出文件中报告以下元素：

dict1的关键值
dict1中键的值的长度
dict2的关键值
dict2中键的值的长度
字典1和字典2的每个键之间的重叠值的数量。

例如：

来自dict1的Ex1有3个键Spat1，D和E. Ex1的值与dict2（Spata2和D）的lnc3中的2个值，lnc2（E）的1个值和lnc1的1个值（Spata1）重叠。最终输出应如下所示：

keydict1    length_value_dict1  keydict2    length_value_dict2  Number_of_overlap
Ex1 3   lnc3    3   2
Ex1 3   lnc2    2   1
Ex1 3   lnc1    4   1
Ex2 4   lnc3    3   1
Ex2 4   lnc2    2   1
Ex2 4   lnc1    4   3

这是我的代码：

output = open("Output.txt", "w")
output.write('keydict1\tlength_value_dict1\tkeydict2\tlength_value_dict2\tNumber_of_overlap\n') 
for key, value in dict1.items():
    len1=len(dict1[key]) #gives length of the key
    for vals in value: #to iterate over each of the values corresponding to key
        for key2, value2 in dict2.items(): #iterates over keys and values of second dictionary
            len2=len(dict2[key2])
            counter = 0 #sets counter to 0
            for vals2 in value2:
                if vals == vals2: #checks values if equal to each other
                    counter = counter + 1 #if it is equal, it adds 1 to the counter, then it is supposed to reset it when it gets to next key2
            newline= key,str(len1),key2,str(len2),str(counter) #For some reason, i cant output the file in the command below except if the integers are converted to strings. Not sure if there is a better trick
            output.write('\t'.join(newline)+"\n")

脚本可以正常运行。但是，输出并不像预期的那样。每次循环时，都不会多次添加计数器，然后将每个配对的比较写在一个单独的行上。

我无法弄清楚错误的位置。以下是上述脚本的输出：

keydict1    length_value_dict1  keydict2    length_value_dict2  Number_of_overlap
Ex2 4   lnc3    3   1
Ex2 4   lnc2    2   1
Ex2 4   lnc1    4   0
Ex2 4   lnc3    3   0
Ex2 4   lnc2    2   0
Ex2 4   lnc1    4   1
Ex2 4   lnc3    3   0
Ex2 4   lnc2    2   0
Ex2 4   lnc1    4   1
Ex2 4   lnc3    3   0
Ex2 4   lnc2    2   0
Ex2 4   lnc1    4   1
Ex1 3   lnc3    3   1
Ex1 3   lnc2    2   0
Ex1 3   lnc1    4   1
Ex1 3   lnc3    3   1
Ex1 3   lnc2    2   0
Ex1 3   lnc1    4   0
Ex1 3   lnc3    3   0
Ex1 3   lnc2    2   1
Ex1 3   lnc1    4   0

Answer 1

你的算法应该是这样的。

for k1, v1 in dict1.items():
    for k2, v2 in dict2.items():
        # now find the number of items that appear in both v1 and v2

但正如您现在所注意到的，您的算法就是这样做的。

for k1, v1 in dict1.items():
    for v in v1:
        for k2, v2 in dict2.items():

实际上，您会发现v中v1项v2出现在for v in v1中的次数应该是0或1。由于k1循环，您可以多次检查密钥k2和v1之间的项目冗余。

现在让我们回到原始算法。我们想要找到的是两个列表v2和len(set(v1).intersection(v2))之间intersection中元素的数量。因为交集是一个集合概念，我们只需做dict1 = {'Ex1': ['Spata1', 'D', 'E'], 'Ex2': ['Fgg', 'Wfdc2', 'F', 'G']} dict2 = {'lnc3': ['Spata1', 'Fgg', 'D'], 'lnc2': ['Fgg', 'E'], 'lnc1': ['Spata1', 'Wfdc2', 'F', 'G']} for k1, v1 in dict1.items(): for k2, v2 in dict2.items(): print '%3s %5d %10s %5d %5d' % (k1, len(v1), k2, len(v2), len(set(v1).intersection(v2)))。以下是一个简单的代码片段，可以实现所有这些，没有特殊的格式化。

Ex2     4       lnc3     3     1
Ex2     4       lnc2     2     1
Ex2     4       lnc1     4     3
Ex1     3       lnc3     3     2
Ex1     3       lnc2     2     1
Ex1     3       lnc1     4     1

请注意，词典没有按照您期望的方式进行键排序的概念。如果你真的想要，可以ways来解决这个问题。

v2

如果您的列表具有重复值，则使用集合交集会使计数倾斜，因为集合会忽略重复元素。找到重叠的传统方法然后在每个元素上创建一个字典，比如v1，然后v2中的每个项目，看看它在from collections import Counter v2_counts = Counter(v2) overlap = sum(v2_counts.get(v, 0) for v in v1)中的存在时间和总结总数。在代码中：

get(key, default_value)

方法key尝试使用键default_value获取字典的值，如果不存在，则会返回json_encode()。

计算python

1 个答案: