Question

我正在尝试与两个字典进行比较，以检查大型数据集的准确性

我想看看两个点是否属于字典1中的相同键，它们是否属于字典2中的相同键

我有很多方法可以使用“两个字典中的if点”来进行double for循环我正在寻找比较两种字典的更快方法

dict_1每个point_id仅具有1个键，而dict_2可以针对1个point_id具有多个键

两个字典都像这样：

{key1 : [list of point id], key2 : [list of point id], etc}

dict_1 = {key1 : [1,2,3,4,5,6], key2 : [7,8,9,10,11,12]}  
dict_2 = {key3 : [1,2,4,6,8,11,12], key4 :[2,5,7,9,10,11,12]}

def accuracy_from_dict_to_dict(dict_1,dict_2):
    total, truth = 0,0
    for key_dict_1 in dict_1:
        point_of_key = dict_1.get(key_dict_1)
        i=0
        while i < len(point_of_key): #for each point of the key_dict_1 list
          j = i+1
          while j < len(point of key):
              for key_dict_2 in dict_2:
                  point_i = point_of_key[i]
                  point_j = point_of_key[j]
                  if point_i in key_dict_2 and point_j in key_dict_2:
                      truth += 1
                  total += 1
                  j += 1  
          i+=1

问题不在于代码本身，而是更多的计算时间。除非数据集足够小，否则将需要很长时间才能运行

Answer 1

看起来您只是从两个字典中检查2个项目的组合。您可以使用standart库中的itertools模块来做得更好：

from itertools import combinations, chain

dict_1 = {'key1' : [1,2,3,4,5,6], 'key2' : [7,8,9,10,11,12]}  
dict_2 = {'key3' : [1,2,4,6,8,11,12], 'key4' :[2,5,7,9,10,11,12]}

c_dict1 = set(chain.from_iterable(combinations(v, 2) for v in dict_1.values()))
c_dict2 = set(chain.from_iterable(combinations(v, 2) for v in dict_2.values()))
total = len(c_dict1) + len(c_dict2)
similarity = len(c_dict1 & c_dict2) / total
print(total, similarity)

将为您打印：

69 0.2753623188405797

如果一个字典中的2个值属于同一个键，那么它们是否属于另一个字典中的同一个键？

1 个答案: