我正在编写一个解析RSS提要的Python脚本。我想维护一个定期更新的feed中的条目字典。应删除Feed中不再存在的条目,新条目应获取默认值,以前看到的条目的值应保持不变。
最好通过示例解释,我认为:
>>> old = {
... 'a': 1,
... 'b': 2,
... 'c': 3
... }
>>> new = {
... 'c': 'x',
... 'd': 'y',
... 'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}
这是我目前的尝试:
def merge_preserving_old_values_and_new_keys(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
这很有效,但在我看来可能有更好或更聪明的方法。
编辑:如果你想测试你的功能:
def my_merge(old, new):
pass
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}
编辑2:
将Martijn Pieters的答案定义为set_merge
,将bravosierra99定义为loop_merge
,将我的第一次尝试定义为orig_merge
,我得到以下时间结果:
>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716
我觉得这很令人惊讶,因为我没想到字典视图方法会慢得多。
答案 0 :(得分:4)
字典有dictionary view objects作为集合。使用这些来获得新旧交集:
def merge_preserving_old_values_and_new_keys(old, new):
result = new.copy()
result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return result
以上使用Python 2语法;如果您使用的是Python 3,请使用old.keys() & new.keys()
,以获得相同的结果:
def merge_preserving_old_values_and_new_keys(old, new):
# Python 3 version
result = new.copy()
result.update((k, old[k]) for k in old.keys() & new.keys())
return result
上述内容将new
中的所有键值对作为起点,然后为old
中的任何键添加>>> merge_preserving_old_values_and_new_keys(old, new)
{'c': 3, 'e': 'z', 'd': 'y'}
的值。
演示:
def merge_preserving_old_values_and_new_keys(old, new):
new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return new
请注意,该函数与您的版本一样,会生成一个新的字典(虽然键和值对象是共享的;它是浅层副本)。
如果您不需要其他任何新词典,您也可以就地更新新词典:
def merge_preserving_old_values_and_new_keys(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
您还可以使用单线词典理解来构建新词典:
substring
答案 1 :(得分:2)
这应该更有效率,因为您不再遍历整个old.items()。此外,由于您没有覆盖某些值,因此您更加清楚自己尝试这样做的方式。
for k, v in new.items():
if k in old.keys():
out[k] = old[k]
else:
out[k] = v
return out
答案 2 :(得分:0)
old = {
'a': 1,
'b': 2,
'c': 3
}
new = {
'c': 'x',
'd': 'y',
'e': 'z'
}
def merge_preserving_old_values_and_new_keys(o, n):
out = {}
for k in n:
if k in o:
out[k] = o[k]
else:
out[k] = n[k]
return out
print merge_preserving_old_values_and_new_keys(old, new)
答案 3 :(得分:0)
我不是100%将此信息添加到讨论中的最佳方式:如有必要,可随意编辑/重新分发。
以下是此处讨论的所有方法的时间结果。
from timeit import timeit
def loop_merge(old, new):
out = {}
for k, v in new.items():
if k in old:
out[k] = old[k]
else:
out[k] = v
return out
def set_merge(old, new):
out = new.copy()
out.update((k, old[k]) for k in old.keys() & new.keys())
return out
def comp_merge(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
def orig_merge(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}
assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out
setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(old, new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}
setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
获胜者是改进的循环方法!
$ python3 merge.py
loop: 0.7791572390015062 # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007 # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006
我很失望,因为字典视图/设置操作方法要冷得多。
对于较大的词典(10 ^ 4项),词典理解方法领先于改进的循环方法,远远超过原始方法。 set操作方法仍然执行最慢。