Question

我有两个字典将ID映射到值。为简单起见，我们可以说这些是字典：

d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}

如上所述，字典不是对称的。我想从词典d_source和d_target获取 keys 字典，其值匹配。生成的字典将d_source个密钥作为自己的密钥，d_target个密钥作为密钥值（以list，tuple或set格式生成）

这将是上述示例的预期返回值应该是以下列表：

{'a': ('1', 'A'),
 'b': ('B',),
 'c': ('C',),
 '3': ('C',)}

有两个similar questions，但这些解决方案无法轻易应用于我的问题。

数据的一些特征：

来源通常小于目标。拥有大约几千个来源（顶部）和更多的目标。
同一个词典中的重复项（d_source和d_target）在值上不太可能。
匹配（粗略估计）不超过d_source项目的50％。
所有键都是整数。

此问题的最佳（性能明智）解决方案是什么？将数据建模到其他数据类型以提高性能是完全可以的，即使使用第三方库（我在考虑numpy）

Answer 1

所有答案都有O(n^2)效率，这不是很好，所以我想回答自己。

我使用2(source_len) + 2(dict_count)(dict_len)内存，效率O(2n)，这是我相信的最佳效果。

你走了：

from collections import defaultdict

d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}

def merge_dicts(source_dict, *rest):
    flipped_rest = defaultdict(list)
    for d in rest:
        while d:
            k, v = d.popitem()
            flipped_rest[v].append(k)
    return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}

new_dict = merge_dicts(d_source, d_target)

顺便说一句，我使用元组是为了不将结果列表链接在一起。

由于您已经添加了数据规范，因此这是一个更贴切的匹配解决方案：

d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}

def second_merge_dicts(source_dict, *rest):
    """Optimized for ~50% source match due to if statement addition.

    Also uses less memory.
    """
    unique_values = set(source_dict.values())
    flipped_rest = defaultdict(list)
    for d in rest:
        while d:
            k, v = d.popitem()
            if v in unique_values:
                flipped_rest[v].append(k)
    return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}

new_dict = second_merge_dicts(d_source, d_target)

Answer 2

from collections import defaultdict
from pprint import pprint

d_source  = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}

d_result = defaultdict(list)
{d_result[a].append(b) for a in d_source for b in d_target if d_source[a] == d_target[b]}

pprint(d_result)

<强>输出：

{'3': ['C'],
 'a': ['A', '1'],
 'b': ['B'],
 'c': ['C']}

计时结果：

from collections import defaultdict
from copy import deepcopy
from random import randint
from timeit import timeit


def Craig_match(source, target):
    result = defaultdict(list)
    {result[a].append(b) for a in source for b in target if source[a] == target[b]}
    return result

def Bharel_match(source_dict, *rest):
    flipped_rest = defaultdict(list)
    for d in rest:
        while d:
            k, v = d.popitem()
            flipped_rest[v].append(k)
    return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}

def modified_Bharel_match(source_dict, *rest):
    """Optimized for ~50% source match due to if statement addition.

    Also uses less memory.
    """
    unique_values = set(source_dict.values())
    flipped_rest = defaultdict(list)
    for d in rest:
        while d:
            k, v = d.popitem()
            if v in unique_values:
                flipped_rest[v].append(k)
    return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}

# generate source, target such that:
# a) ~10% duplicate values in source and target
# b) 2000 unique source keys, 20000 unique target keys
# c) a little less than 50% matches source value to target value
# d) numeric keys and values
source = {}
for k in range(2000):
    source[k] = randint(0, 1800)
target = {}
for k in range(20000):
    if k < 1000:
        target[k] = randint(0, 2000)
    else:
        target[k] = randint(2000, 19000)

best_time = {}
approaches = ('Craig', 'Bharel', 'modified_Bharel')
for a in approaches:
    best_time[a] = None

for _ in range(3):
    for approach in approaches:
        test_source = deepcopy(source)
        test_target = deepcopy(target)

        statement = 'd=' + approach + '_match(test_source,test_target)'
        setup = 'from __main__ import test_source, test_target, ' + approach + '_match'
        t = timeit(stmt=statement, setup=setup, number=1)
        if not best_time[approach] or (t < best_time[approach]):
            best_time[approach] = t

for approach in approaches:
    print(approach, ':', '%0.5f' % best_time[approach])

<强>输出：

Craig : 7.29259
Bharel : 0.01587
modified_Bharel : 0.00682

Answer 3

这是另一种解决方案。有很多方法可以做到这一点

for key1 in d1:
    for key2 in d2:
        if d1[key1] == d2[key2]:
            stuff

请注意，您可以使用key1和key2的任何名称。

Answer 4

这可能是＆＃34;作弊＆＃34;在某些方面，虽然如果您要查找键的匹配值而不管区分大小写，那么您可以这样做：

import sets

aa = {'a': 1, 'b': 2, 'c':3}
bb = {'A': 1, 'B': 2, 'd': 3}

bbl = {k.lower():v for k,v in bb.items()}

result = {k:k.upper() for k,v in aa.iteritems() & bbl.viewitems()}
print( result )

<强>输出：

{'a': 'A', 'b': 'B'}

bbl声明会将bb键更改为小写（可以是aa或bb）。

_{*我只是在我的手机上测试了这个，所以我只想把这个想法扔到那里......而且，自从我开始撰写答案以来，你已经彻底改变了你的问题，所以你得到的是你得到。}

Answer 5

由您决定最佳解决方案。这是 a 解决方案：

def dicts_to_tuples(*dicts):
    result = {}
    for d in dicts:
        for k,v in d.items():
            result.setdefault(v, []).append(k)
    return [tuple(v) for v in result.values() if len(v) > 1]

d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'A': 1, 'B': 2}
print dicts_to_tuples(d1, d2)

如何在两个dicts中找到匹配值的dict键？

5 个答案: