我有两个字典将ID映射到值。为简单起见,我们可以说这些是字典:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
如上所述,字典不是对称的。
我想从词典d_source
和d_target
获取 keys 字典,其值匹配。生成的字典将d_source
个密钥作为自己的密钥,d_target
个密钥作为密钥值(以list
,tuple
或set
格式生成)
这将是上述示例的预期返回值应该是以下列表:
{'a': ('1', 'A'),
'b': ('B',),
'c': ('C',),
'3': ('C',)}
有两个similar questions,但这些解决方案无法轻易应用于我的问题。
数据的一些特征:
d_source
和d_target
)在值上不太可能。d_source
项目的50%。此问题的最佳(性能明智)解决方案是什么? 将数据建模到其他数据类型以提高性能是完全可以的,即使使用第三方库(我在考虑numpy)
答案 0 :(得分:2)
所有答案都有O(n^2)
效率,这不是很好,所以我想回答自己。
我使用2(source_len) + 2(dict_count)(dict_len)
内存,效率O(2n)
,这是我相信的最佳效果。
你走了:
from collections import defaultdict
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def merge_dicts(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = merge_dicts(d_source, d_target)
顺便说一句,我使用元组是为了不将结果列表链接在一起。
由于您已经添加了数据规范,因此这是一个更贴切的匹配解决方案:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def second_merge_dicts(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = second_merge_dicts(d_source, d_target)
答案 1 :(得分:1)
from collections import defaultdict
from pprint import pprint
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
d_result = defaultdict(list)
{d_result[a].append(b) for a in d_source for b in d_target if d_source[a] == d_target[b]}
pprint(d_result)
<强>输出:强>
{'3': ['C'],
'a': ['A', '1'],
'b': ['B'],
'c': ['C']}
计时结果:
from collections import defaultdict
from copy import deepcopy
from random import randint
from timeit import timeit
def Craig_match(source, target):
result = defaultdict(list)
{result[a].append(b) for a in source for b in target if source[a] == target[b]}
return result
def Bharel_match(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
def modified_Bharel_match(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
# generate source, target such that:
# a) ~10% duplicate values in source and target
# b) 2000 unique source keys, 20000 unique target keys
# c) a little less than 50% matches source value to target value
# d) numeric keys and values
source = {}
for k in range(2000):
source[k] = randint(0, 1800)
target = {}
for k in range(20000):
if k < 1000:
target[k] = randint(0, 2000)
else:
target[k] = randint(2000, 19000)
best_time = {}
approaches = ('Craig', 'Bharel', 'modified_Bharel')
for a in approaches:
best_time[a] = None
for _ in range(3):
for approach in approaches:
test_source = deepcopy(source)
test_target = deepcopy(target)
statement = 'd=' + approach + '_match(test_source,test_target)'
setup = 'from __main__ import test_source, test_target, ' + approach + '_match'
t = timeit(stmt=statement, setup=setup, number=1)
if not best_time[approach] or (t < best_time[approach]):
best_time[approach] = t
for approach in approaches:
print(approach, ':', '%0.5f' % best_time[approach])
<强>输出:强>
Craig : 7.29259
Bharel : 0.01587
modified_Bharel : 0.00682
答案 2 :(得分:1)
这是另一种解决方案。有很多方法可以做到这一点
for key1 in d1:
for key2 in d2:
if d1[key1] == d2[key2]:
stuff
请注意,您可以使用key1和key2的任何名称。
答案 3 :(得分:1)
这可能是&#34;作弊&#34;在某些方面,虽然如果您要查找键的匹配值而不管区分大小写,那么您可以这样做:
import sets
aa = {'a': 1, 'b': 2, 'c':3}
bb = {'A': 1, 'B': 2, 'd': 3}
bbl = {k.lower():v for k,v in bb.items()}
result = {k:k.upper() for k,v in aa.iteritems() & bbl.viewitems()}
print( result )
<强>输出:强>
{'a': 'A', 'b': 'B'}
bbl
声明会将bb
键更改为小写(可以是aa
或bb
)。
*我只是在我的手机上测试了这个,所以我只想把这个想法扔到那里......而且,自从我开始撰写答案以来,你已经彻底改变了你的问题,所以你得到的是你得到。
答案 4 :(得分:0)
由您决定最佳解决方案。这是 a 解决方案:
def dicts_to_tuples(*dicts):
result = {}
for d in dicts:
for k,v in d.items():
result.setdefault(v, []).append(k)
return [tuple(v) for v in result.values() if len(v) > 1]
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'A': 1, 'B': 2}
print dicts_to_tuples(d1, d2)