合并字典保留旧密钥和新值

时间:2016-08-26 21:06:18

标签: python dictionary

我正在编写一个解析RSS提要的Python脚本。我想维护一个定期更新的feed中的条目字典。应删除Feed中不再存在的条目,新条目应获取默认值,以前看到的条目的值应保持不变。

最好通过示例解释,我认为:

>>> old = {
...     'a': 1,
...     'b': 2,
...     'c': 3
... }
>>> new = {
...     'c': 'x',
...     'd': 'y',
...     'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}

这是我目前的尝试:

def merge_preserving_old_values_and_new_keys(old, new):
       out = {}
       for k, v in new.items():
           out[k] = v
       for k, v in old.items():
           if k in out:
               out[k] = v
       return out

这很有效,但在我看来可能有更好或更聪明的方法。

编辑:如果你想测试你的功能:

def my_merge(old, new):
    pass

old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}

out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}

编辑2: 将Martijn Pieters的答案定义为set_merge,将bravosierra99定义为loop_merge,将我的第一次尝试定义为orig_merge,我得到以下时间结果:

>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716

我觉得这很令人惊讶,因为我没想到字典视图方法会慢得多。

4 个答案:

答案 0 :(得分:4)

字典有dictionary view objects作为集合。使用这些来获得新旧交集:

def merge_preserving_old_values_and_new_keys(old, new):
    result = new.copy()
    result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
    return result

以上使用Python 2语法;如果您使用的是Python 3,请使用old.keys() & new.keys(),以获得相同的结果:

def merge_preserving_old_values_and_new_keys(old, new):
    # Python 3 version
    result = new.copy()
    result.update((k, old[k]) for k in old.keys() & new.keys())
    return result

上述内容将new中的所有键值对作为起点,然后为old中的任何键添加>>> merge_preserving_old_values_and_new_keys(old, new) {'c': 3, 'e': 'z', 'd': 'y'} 的值。

演示:

def merge_preserving_old_values_and_new_keys(old, new):
    new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
    return new

请注意,该函数与您的版本一样,会生成一个新的字典(虽然键和值对象是共享的;它是浅层副本)。

如果您不需要其他任何新词典,您也可以就地更新新词典:

def merge_preserving_old_values_and_new_keys(old, new):
    return {k: old[k] if k in old else v for k, v in new.items()}

您还可以使用单线词典理解来构建新词典:

substring

答案 1 :(得分:2)

这应该更有效率,因为您不再遍历整个old.items()。此外,由于您没有覆盖某些值,因此您更加清楚自己尝试这样做的方式。

for k, v in new.items():
    if k in old.keys():
      out[k] = old[k]
    else:
      out[k] = v
return out

答案 2 :(得分:0)

old = {
    'a': 1,
    'b': 2,
    'c': 3
}
new = {
    'c': 'x',
    'd': 'y',
    'e': 'z'
}

def merge_preserving_old_values_and_new_keys(o, n):
    out = {}
    for k in n:
        if k in o:
            out[k] = o[k]
        else:
            out[k] = n[k]
    return out

print merge_preserving_old_values_and_new_keys(old, new)

答案 3 :(得分:0)

我不是100%将此信息添加到讨论中的最佳方式:如有必要,可随意编辑/重新分发。

以下是此处讨论的所有方法的时间结果。

from timeit import timeit

def loop_merge(old, new):
    out = {}
    for k, v in new.items():
        if k in old:
            out[k] = old[k]
        else:
                out[k] = v
    return out

def set_merge(old, new):
    out = new.copy()
    out.update((k, old[k]) for k in old.keys() & new.keys())
    return out

def comp_merge(old, new):
    return {k: old[k] if k in old else v for k, v in new.items()}

def orig_merge(old, new):
    out = {}
    for k, v in new.items():
        out[k] = v
    for k, v in old.items():
        if k in out:
            out[k] = v
    return out


old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}

assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out

setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""

for a in ['loop', 'set', 'comp', 'orig']:
    time = timeit('{}_merge(old, new)'.format(a), setup=setup)
    print('{}: {}'.format(a, time))

size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}

setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""

for a in ['loop', 'set', 'comp', 'orig']:
    time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
    print('{}: {}'.format(a, time))

获胜者是改进的循环方法!

$ python3 merge.py
loop: 0.7791572390015062  # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007  # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006

我很失望,因为字典视图/设置操作方法要冷得多。

对于较大的词典(10 ^ 4项),词典理解方法领先于改进的循环方法,远远超过原始方法。 set操作方法仍然执行最慢。