我正在尝试与两个字典进行比较,以检查大型数据集的准确性
我想看看两个点是否属于字典1中的相同键,它们是否属于字典2中的相同键
我有很多方法可以使用“两个字典中的if点”来进行double for循环 我正在寻找比较两种字典的更快方法
dict_1每个point_id仅具有1个键,而dict_2可以针对1个point_id具有多个键
两个字典都像这样:
{key1 : [list of point id], key2 : [list of point id], etc}
dict_1 = {key1 : [1,2,3,4,5,6], key2 : [7,8,9,10,11,12]}
dict_2 = {key3 : [1,2,4,6,8,11,12], key4 :[2,5,7,9,10,11,12]}
def accuracy_from_dict_to_dict(dict_1,dict_2):
total, truth = 0,0
for key_dict_1 in dict_1:
point_of_key = dict_1.get(key_dict_1)
i=0
while i < len(point_of_key): #for each point of the key_dict_1 list
j = i+1
while j < len(point of key):
for key_dict_2 in dict_2:
point_i = point_of_key[i]
point_j = point_of_key[j]
if point_i in key_dict_2 and point_j in key_dict_2:
truth += 1
total += 1
j += 1
i+=1
问题不在于代码本身,而是更多的计算时间。 除非数据集足够小,否则将需要很长时间才能运行
答案 0 :(得分:0)
看起来您只是从两个字典中检查2个项目的组合。您可以使用standart库中的itertools模块来做得更好:
from itertools import combinations, chain
dict_1 = {'key1' : [1,2,3,4,5,6], 'key2' : [7,8,9,10,11,12]}
dict_2 = {'key3' : [1,2,4,6,8,11,12], 'key4' :[2,5,7,9,10,11,12]}
c_dict1 = set(chain.from_iterable(combinations(v, 2) for v in dict_1.values()))
c_dict2 = set(chain.from_iterable(combinations(v, 2) for v in dict_2.values()))
total = len(c_dict1) + len(c_dict2)
similarity = len(c_dict1 & c_dict2) / total
print(total, similarity)
将为您打印:
69 0.2753623188405797