有一个包含许多列表的字典,例如
list_dic= {
q1:[1,2,3,4,5]
q2:[2,3,5]
q3:[2,5]
}
我希望得到每个列表的所有常用项目,例如q1和q2的公共项目计数是3 =(2,3,5)
q1={q2:3, q3:2}
q2={q1:3,q3:2}
q3={q1:2, q2:2}
我执行此任务的代码是:
result = {}
for name, source_list in list_dic.items():
for target_name, target_list in list_dic.items():
count = 0
for item in source_list:
if item in target_list:
count+=1
result[name][target_name] = count
但是这个算法效率很低,我想知道一个更好的算法来完成这个任务
答案 0 :(得分:3)
我认为应该这样做:
import itertools
import collections
q1 = 'q1'
q2 = 'q2'
q3 = 'q3'
dic_list = {
q1:[1,2,3,4,5],
q2:[2,3,5],
q3:[2,5]
}
#sets are much more efficient for this sort of thing. Create a dict
#of the same structure as the old one, only with `set` as values
#instead of `list`
dic_set = {k:set(v) for k,v in dic_list.items()}
new_dic = collections.defaultdict(dict)
for k1,k2 in itertools.combinations(dic_set,2):
#to get the count, we just need to know the size of the intersection
#of the 2 sets.
value = len(dic_set[k1] & dic_set[k2])
new_dic[k1][k2] = value
new_dic[k2][k1] = value
print (new_dic)
如果您关注评论,则combinations
略快于permutations
:
import itertools
import collections
q1 = 'q1'
q2 = 'q2'
q3 = 'q3'
dic_list = {
q1:[1,2,3,4,5],
q2:[2,3,5],
q3:[2,5]
}
dic_set = {k:set(v) for k,v in dic_list.items()}
def combo_solution():
new_dic = collections.defaultdict(dict)
for k1,k2 in itertools.combinations(dic_set,2):
value = len(dic_set[k1] & dic_set[k2])
new_dic[k1][k2] = value
new_dic[k1][k2] = value
return new_dic
def perm_solution():
new_dic = collections.defaultdict(dict)
for k1, k2 in itertools.permutations(dic_set,2):
new_dic[k1][k2] = len(dic_set[k1] & dic_set[k2])
return new_dic
import timeit
print timeit.timeit('combo_solution()','from __main__ import combo_solution',number=100000)
print timeit.timeit('perm_solution()','from __main__ import perm_solution',number=100000)
结果:
0.58366894722 #combinations
0.832300901413 #permutations
这是因为set.intersection
是一个O(min(N,M))操作 - 这很便宜,但如果你做的次数是你需要的两倍,可以加起来。
答案 1 :(得分:3)
from collections import defaultdict
#Create a default dict. You don;t have to handle KeyError condition
result = defaultdict(dict)
list_dic= {
'q1':[1,2,3,4,5],
'q2':[2,3,5],
'q3':[2,5],
}
#Convert the value list to set list
set_dict = {k:set(v) for k,v in list_dic.items()}
# For both way mapping, you need permutation i.e. (q1, q2) and (q2, q1)
for k1, k2 in permutations(set_dict.keys(),2):
# Now `&` is Set Intersection. The Len will return the length of the common elements
result[k1][k2] = len(set_dict[k1] & set_dict[k2])
result
defaultdict(<type 'dict'>, {'q1': {'q3': 2, 'q2': 3}, 'q3': {'q1': 2, 'q2': 2}, 'q2': {'q1': 3, 'q3': 2}})
答案 2 :(得分:0)
如果您不打扰列表是否可以包含重复的数字,您可以使用set()类型
来执行此操作>>> s1 = set([1,2,3,4,5])
>>> s2 = set([3,4,5,6,7,8])
>>> s1 & s2
{3, 4, 5}