我有两个字典列表,USA
和GOOG
相同时就需要合并它们。
list1 =
[{'USA': 'Eastern',
'GOOG': '2019',
'Up': {'Upfront': 45},
'Right': {'Upfront': 12}},
{'USA': 'Western',
'GOOG': '2019',
'Up': {'Upfront': 10},
'Right': {'Upfront': 15}}]
list2=
[{'USA': 'Western',
'GOOG': '2019',
'Down': {'Downback': 35},
'Right': {'Downback': 25}},
{'USA': 'Eastern',
'GOOG': '2018',
'Down': {'Downback': 15},
'Right': {'Downback': 55}}]
由于USA
和GOOG
在list1
中的第二元素和list2
中的第一元素具有相同的值,因此应将它们合并。预期结果如下-
Result =
[{'USA': 'Eastern',
'GOOG': '2019',
'Up': {'Upfront': 45},
'Right': {'Upfront': 12}},
{'USA': 'Western',
'GOOG': '2019',
'Up': {'Upfront': 10},
'Down': {'Downback': 35},
'Right': {'Upfront': 15, 'Downback': 25}},
{'USA': 'Eastern',
'GOOG': '2018',
'Down': {'Downback': 15},
'Right': {'Downback': 55}}]
我们如何为此编写通用代码。我尝试使用defaultdict,但不知道如何连接任意数量的其余字典。
我的尝试
from collections import defaultdict
dics = list1+list2
for dic in dics:
for key, val in dic.items():
dd[key].append(val)
for dic in dics:
for key, val in dic.items():
dd[key].append(val)
答案 0 :(得分:2)
您需要执行两项算法任务:查找对于USA和GOOGL具有相同值的记录,然后再进行连接,并以这样的方式进行操作:如果两个记录中都存在相同的键,则它们的值将被合并
第一个方法的幼稚方法是使用一个for循环,该循环将迭代list1的值,对于每个值,将迭代list2的所有值-两个分开的循环不会切割它,您需要两个< em>嵌套 for
循环:
for element in list1:
for other_element in list2:
if ...:
...
虽然这种方法可行,并且适用于小型列表(例如,<1000条记录),但它花费的时间和资源与列表大小的平方成正比-也就是说,对于大约有1000个项目,我们正在讨论100万次迭代。如果列表中只有1.000.000个项目,则该计算将进行1 * 10 ^ 12比较,而这在当今的计算机中根本不可行。
因此,一个不错的解决方案是以比较键用作哈希的方式重新创建列表之一,方法是将列表复制到字典中,键是要比较的值,然后仅在第二个列表上进行一次迭代。由于词典有固定的时间来查找项目,因此比较的数量与列表的大小成正比。
任务的第二部分是比较以将一条记录复制到结果列表,并更新结果副本上的键,以便合并所有重复的键。为避免在复制第一条记录时出现问题,我们使用Python的copy.deepcopy
更加安全,这将确保子词典与原始记录中的对象不同,并且保持隔离状态。
from copy import deepcopy
def merge_lists(list1, list2):
# create dictionary from list1:
dict1 = {(record["GOOG"], record["USA"]): record for record in list1}
#compare elements in list2 to those on list1:
result = {}
for record in list2:
ckey = record["GOOG"], record["USA"]
new_record = deepcopy(record)
if ckey in dict1:
for key, value in dict1[ckey].items():
if key in ("GOOG", "USA"):
# Do not merge these keys
continue
# Dict's "setdefault" finds a key/value, and if it is missing
# creates a new one with the second parameter as value
new_record.setdefault(key, {}).update(value)
result[ckey] = new_record
# Add values from list1 that were not matched in list2:
for key, value in dict1.items():
if key not in result:
result[key] = deepcopy(value)
return list(result.values())
答案 1 :(得分:1)
这是我的尝试。不知道这是否是最好的方法,但这只是一个开始。
步骤:
代码:
import operator as op
import itertools as it
from functools import reduce
from pprint import pprint
dictionaries = reduce(op.add, (list1, list2,))
groups = it.groupby(sorted([(op.itemgetter('USA', 'GOOG')(d), i)
for i, d in enumerate(dictionaries)]),
key=op.itemgetter(0))
results = []
for key, group in groups:
_, indices = zip(*group)
if len(indices) == 1:
i, = indices
results.append(dictionaries[i])
else:
merge = dictionaries[indices[0]]
for i in indices[1:]:
merge.update(dictionaries[i])
results.append(merge)
pprint(results, indent=4)
输出:
[{'Down':{'Downback':15}, 'GOOG':'2018', '右':{'Downback':55}, 'USA':'Eastern'}, {'GOOG':'2019', '右':{'Upfront':12}, '美国':'东部', 'Up':{'Upfront':45}}, {'Down':{'Downback':35}, 'GOOG':'2019', '右':{'Downback':25}, '美国':'西方', 'Up':{'Upfront':10}}]
答案 2 :(得分:1)
这是我的解决方案。它设法重现您请求的结果。 请忽略变量的命名错误。我发现这个问题很有趣。
def joinListByDictionary(list1, list2):
"""Join lists on USA and GOOG having the same value"""
list1.extend(list2)
matchIndx = []
matches = []
for dicts in range(len(list1)):
for dicts2 in range(len(list1)):
if dicts == dicts2:
continue
if list1[dicts]["GOOG"] == list1[dicts2]["GOOG"] and list1[dicts]["USA"] == list1[dicts2]["USA"]:
matches.append(list1[dicts])
matchIndx.append(dicts)
for dictz in matches:
for dictzz in matches:
for key in dictz.keys():
if key in dictzz.keys() and isinstance(dictzz[key], dict):
dictzz[key].update(dictz[key])
matches.remove(dictz)
newList = [list1[ele] for ele in range(len(list1)) if ele not in matchIndx]
newList.extend(matches)
print newList
return newList
joinListByDictionary(list1, list2)