我有一个具有以下结构的列表:
data = [[
{
"id": 713,
"prediction": 4.8,
"confidence": [
{"percentile": "75", "lower": 4.8, "upper": 5.7}
],
},
{
"id": 714,
"prediction": 4.93,
"confidence": [
{"percentile": "75", "lower": 4.9, "upper": 5.7}
],
},
],
[
{
"id": 713,
"prediction": 5.8,
"confidence": [
{"percentile": "75", "lower": 4.2, "upper": 6.7}
],
},
{
"id": 714,
"prediction": 2.93,
"confidence": [
{"percentile": "75", "lower": 1.9, "upper": 3.7}
],
},
]]
因此,这里有一个包含两个列表的列表,但也可以是更多列表。每个列表包含一个带有ID和预测区间的预测,另一个列表包含一个字典。
我需要合并这些列表,以便每个id对应一个平均值的数字。
我尝试搜索,但找不到与该嵌套结构匹配的答案。
预期输出如下:
merged_data = [
{
"id": 713,
"prediction": 5.3,
"confidence": [
{"percentile": "75", "lower": 4.5, "upper": 6.2}
],
},
{
"id": 714,
"prediction": 3.93,
"confidence": [
{"percentile": "75", "lower": 3.4, "upper": 4.7}
],
},
]
答案 0 :(得分:2)
def merge_items(items):
result = {}
if len(items):
result['id'] = items[0]['id']
result['prediction'] = round(sum([item['prediction'] for item in items]) / len(items), 2)
result['confidence'] = []
result['confidence'].append({
'percentile': items[0]['confidence'][0]['percentile'],
'lower': round(sum(item['confidence'][0]['lower'] for item in items) / len(items), 2),
'upper': round(sum(item['confidence'][0]['upper'] for item in items) / len(items), 2),
})
return result
result = []
ids = list(set([el['id'] for item in data for el in item]))
for id in ids:
to_merge = [sub_item for item in data for sub_item in item if sub_item['id'] == id]
result.append(merge_items(to_merge))
print(result)
答案 1 :(得分:1)
dicc = {}
for e in l:
for d in e:
if d["id"] not in dicc:
dicc[d["id"]] = {"prediction": [], "lower": [], "upper": []}
dicc[d["id"]]["prediction"].append(d["prediction"])
dicc[d["id"]]["lower"].append(d["confidence"][0]["lower"])
dicc[d["id"]]["upper"].append(d["confidence"][0]["upper"])
for k in dicc:
dicc[k]["average_prediction"] = sum(dicc[k]["prediction"])/len(dicc[k]["prediction"])
dicc[k]["average_lower"] = sum(dicc[k]["lower"])/len(dicc[k]["lower"])
dicc[k]["average_upper"] = sum(dicc[k]["upper"])/len(dicc[k]["upper"])
print(dicc)
{713:{'prediction':[4.8,5.8],'lower':[4.8,4.2],'upper':[5.7,6.7],'average_prediction':5.3,'average_lower':4.5,' average_upper':6.2},714:{'prediction':[4.936893921359024,2.936893921359024],'lower':[4.9,1.9],'upper':[5.7,3.7],'average_prediction':3.936893921359024,'average_lower':3.4000000000000004 ,'average_upper':4.7}}
答案 2 :(得分:1)
您确实有三个问题。
groups = {}
# `data` is the outer list in your nested structure
for d in (d for L in data for d in L):
L = groups.get(d['id'], [])
L.append(d)
groups[d['id']] = L
请注意,这假定了一个非常一致的对象结构(如您所示)。如果您有时缺少键,长度不匹配或其他差异,则必须认真思考合并这些结构时要发生的事情的确切细节-没有一种适合所有人的解决方案
def walk(avgs, new, n):
"""
Most of this algorithm is just walking the object structure.
We keep any keys, lists, etc the same and only average the
numeric elements.
"""
if isinstance(avgs, dict):
return {k:walk(avgs[k], new[k], n) for k in avgs}
if isinstance(avgs, list):
return [walk(x, y, n) for x,y in zip(avgs, new)]
if isinstance(avgs, float): # integers and whatnot also satisfy this
"""
This is the only place that averaging actually happens.
At the risk of some accumulated errors, this directly
computes the total of the last n+1 items and divides
by n+1.
"""
return (avgs*n+new)/(n+1.)
return avgs
def merge(L):
if not L:
# never happens using the above grouping code
return None
d = L[0]
for n, new in enumerate(L[1:], 1):
d = walks(d, new, n)
return d
averaged = {k:merge(v) for k,v in groups.items()}
您可能只希望对某些键(例如预测)进行平均。您可以事先对分组的对象进行过滤,也可以事后进行过滤(事先进行过滤可能更有效):
# before
groups = {
# any transformation you'd like to apply to the dictionaries
k:[{s:d[s] for s in ('prediction', 'confidence')} for d in L] for k,L in groups.items()
}
# after
averaged = {
# basically the same code, except there's only one object per key
k:{s:d[s] for s in ('prediction', 'confidence')} for k,d in averaged.items()
}
关于效率的说明,我创建了一堆中间列表,但实际上并不是必需的。您可以完全应用滚动更新算法并节省一些内存,而不用进行分组然后进行汇总。
averaged = {}
# `data` is the outer list in your nested structure
for d in (d for L in data for d in L):
key = d['id']
d = {s:d[s] for s in ('prediction', 'confidence')} # any desired transforms
if key not in averaged:
averaged[key] = (d, 1)
else:
agg, n = groups[key]
averaged[key] = (walk(agg, d, n), n+1)
averaged = {k:v[0] for k,v in averaged.items()}
def inline_key(d, key):
# not a pure function, but we're lazy, and the original
# values are never used
d['id'] = key
return d
final_result = [inline_key(d, k) for k,d in averaged.items()]
答案 3 :(得分:1)
尝试一下:
from copy import deepcopy
input = [[
{
"id": 713,
"prediction": 4.8,
"confidence": [
{"percentile": "75", "lower": 4.8, "upper": 5.7}
],
},
{
"id": 714,
"prediction": 4.936893921359024,
"confidence": [
{"percentile": "75", "lower": 4.9, "upper": 5.7}
],
},
],
[
{
"id": 713,
"prediction": 5.8,
"confidence": [
{"percentile": "75", "lower": 4.2, "upper": 6.7}
],
},
{
"id": 714,
"prediction": 2.936893921359024,
"confidence": [
{"percentile": "75", "lower": 1.9, "upper": 3.7}
],
},
]]
final_dict_list = []
processed_id = []
for item in input:
for dict_ele in item:
if dict_ele["id"] in processed_id:
for final_item in final_dict_list:
if final_item['id'] == dict_ele["id"]:
final_item["prediction"] += dict_ele["prediction"]
final_item["confidence"][0]["lower"] += dict_ele["confidence"][0]["lower"]
final_item["confidence"][0]["upper"] += dict_ele["confidence"][0]["upper"]
else:
final_dict = deepcopy(dict_ele)
final_dict_list.append(final_dict)
processed_id.append(dict_ele["id"])
numer_of_items = len(input)
for item in final_dict_list:
item["prediction"] /= numer_of_items
item["confidence"][0]["lower"] /= numer_of_items
item["confidence"][0]["upper"] /= numer_of_items
print(final_dict_list)
输出:
[
{'confidence': [{'upper': 6.2, 'lower': 4.5, 'percentile': '75'}], 'id': 713, 'prediction': 5.3},
{'confidence': [{'upper': 4.7, 'lower': 3.4000000000000004, 'percentile': '75'}], 'id': 714, 'prediction': 3.936893921359024}]
仅此一点,如果创建的数据结构稍有不同,可能会容易得多。