在Python中按参数合并两个字典列表

时间:2019-06-25 10:27:49

标签: python python-3.x list dictionary aggregate

我有2个包含印象数据和点击数据的字典列表。例如:

  

[{'offerId':'1650','position':'15','clicksCount':21},{'offerId':   '2323','position':'12','clicksCount':14},{'offerId':'2323',   'position':'14','clicksCount':8},{'offerId':'1621','position':   '10','clicksCount':7}] ...

     

[{'offerId':'3207','position':'9','impressionsCount':866},   {'offerId':'1650','position':'6','impressionsCount':896},   {'offerId':'3207','position':'1','impressionsCount':909},   {'offerId':'2323','position':'12'}] ...

我需要将其合并在一起,并按offerId和位置进行合并,以获取每个要约位置的结果(点击和展示)。

要做类似的事情 enter image description here

我尝试过这段代码,但是返回了错误的结果:

d = defaultdict(dict)
for l in (clicks_aggregated_data, impressions_aggregated_data):
    for elem in l:
        d[elem['offerId']].update(elem)
        d[elem['position']].update(elem)
combined_data = list(d.values())


for model, group in groupby(combined_data, key=lambda x:x['offerId']):
    print(list(group))

有人可以帮我达到像桌子一样的效果(截图)吗?

2 个答案:

答案 0 :(得分:1)

您可以尝试从impressions_aggregated_data创建查找字典,然后进行合并。

例如:

impressions_aggregated_data_lookup = {"{}_{}".format(i["offerId"], i["position"]) : i["impressionsCount"] for i in impressions_aggregated_data}

for i in clicks_aggregated_data:
    if "{}_{}".format(i["offerId"], i["position"]) in impressions_aggregated_data_lookup:
        i.update({"impressionsCount": impressions_aggregated_data_lookup["{}_{}".format(i["offerId"], i["position"])]})

pprint(clicks_aggregated_data)

答案 1 :(得分:0)

我希望这是您想要做的。使用两个字典创建pandas dataframe,然后找到clicksimpressions的总和。请参见下面的样机。让我知道它是否有效。

import pandas as pd

d1=[{'offerId': '1650', 'position': '15', 'clicksCount': 21}, 
 {'offerId': '2323', 'position': '12', 'clicksCount': 14}, 
 {'offerId': '2323', 'position': '14', 'clicksCount': 8}, 
 {'offerId': '1621', 'position': '10', 'clicksCount': 7}]

d2=[{'offerId': '3207', 'position': '9', 'impressionsCount': 866},
 {'offerId': '1650', 'position': '6', 'impressionsCount': 896}, 
 {'offerId': '3207', 'position': '1', 'impressionsCount': 909}, 
 {'offerId': '2323', 'position': '12'}]

combdf=df1.append([pd.DataFrame(d1), pd.DataFrame(d2)],sort=False)

combdf.groupby(['offerId', 'position']).sum()[["clicksCount", "impressionsCount"]].reset_index()

以下结果:

offerId position    clicksCount impressionsCount
0   1621    10  14.0    0.0
1   1650    15  42.0    0.0
2   1650    6   0.0 896.0
3   2323    12  28.0    0.0
4   2323    14  16.0    0.0
5   3207    1   0.0 909.0
6   3207    9   0.0 866.0