如何合并具有相同特定键值的多个字典?

时间:2019-11-04 14:35:53

标签: python json list dictionary

我有这样的字典/键值对列表:

list = [{'mid': 123, 'msg': 'sometext', 'antivirus': 'positive'},
        {'mid': 123, 'msg': 'sometext2', 'antivirus': 'positive'},
        {'mid': 456, 'msg': 'sometext3', 'antivirus': 'positive'},
        {'mid': 456, 'msg': 'sometext4', 'antivirus': 'positive'},
        {'mid': 789, 'msg': 'sometext5', 'antivirus': 'positive'}]

我希望结果是字典的新列表(如果可能的话,以最有效的方式),将其按' mid '键的值进行分组:

result = [{'mid': 123, 'msg': 'sometext,sometext2', 'antivirus': 'positive,positive'}, 
          {'mid': 456, 'msg': 'sometext3,sometext4', 'antivirus': 'positive,positive'},
          {'mid': 789, 'msg': 'sometext5', 'antivirus': 'positive'}]

4 个答案:

答案 0 :(得分:0)

对这种方法不是很兴奋,但是它将带您到那里。使用lst对字典defaultdict进行迭代,以mid的值进行分组,然后对那个进行迭代以产生输出,并结合msgantivirus键。

from collections import defaultdict

lst = [{'mid': 123, 'msg': 'sometext', 'antivirus': 'positive'},
       {'mid': 123, 'msg': 'sometext2', 'antivirus': 'positive'},
       {'mid': 456, 'msg': 'sometext3', 'antivirus': 'positive'},
       {'mid': 456, 'msg': 'sometext4', 'antivirus': 'positive'},
       {'mid': 789, 'msg': 'sometext5', 'antivirus': 'positive'}]

dd = defaultdict(list)
for d in lst:
    key = d['mid']
    dd[key].append(d)

output = []
for (k,v) in dd.items():
    output.append({
        'mid':       k,
        'msg':       ','.join(x['msg']       for x in v),
        'antivirus': ','.join(x['antivirus'] for x in v),
    })

print(output)
[
  {'mid': 123, 'msg': 'sometext,sometext2', 'antivirus': 'positive,positive'}, 
  {'mid': 456, 'msg': 'sometext3,sometext4', 'antivirus': 'positive,positive'}, 
  {'mid': 789, 'msg': 'sometext5', 'antivirus': 'positive'}
]

答案 1 :(得分:0)

您可以只使用pandas dataFrame:

import pandas as pd

lst  = [{'mid': 123, 'msg': 'sometext', 'antivirus': 'positive'},
        {'mid': 123, 'msg': 'sometext2', 'antivirus': 'positive'},
        {'mid': 456, 'msg': 'sometext3', 'antivirus': 'positive'},
        {'mid': 456, 'msg': 'sometext4', 'antivirus': 'positive'},
        {'mid': 789, 'msg': 'sometext5', 'antivirus': 'positive'}]

d = (pd.DataFrame(lst)
       .groupby(['mid'])
       .agg(','.join)
       .reset_index()
       .to_dict('r'))

print (d)

输出:

[{'mid': 123, 'antivirus': 'positive,positive', 'msg': 'sometext,sometext2'}, 
 {'mid': 456, 'antivirus': 'positive,positive', 'msg': 'sometext3,sometext4'}, 
 {'mid': 789, 'antivirus': 'positive', 'msg': 'sometext5'}]

答案 2 :(得分:0)

将您的一个变量(list)与内置变量相同是一个不好的主意,因此我在此处使用l

使用中间defaultdict:

from collections import defaultdict


intermediate = defaultdict(lambda: defaultdict(list))
for record in l:
    mid = record["mid"]
    for key, value in record.items():
        if key == "mid":
            continue
        intermediate[mid][key].append(value)

result = [
    {"mid": mid, **{key: ",".join(value) for key, value in attributes.items()}}
    for mid, attributes in intermediate.items()
]
result

答案 3 :(得分:0)

(list是python中的关键字,因此我将名称更改为mylist) 这是您必须提供的一线服务:

import itertools; map(lambda sub: reduce(lambda a,b: { key : ",".join(set(filter(lambda x: x!='', [str(a.get(key, ''))] + [str(b.get(key, ''))]))) for key in set(a.keys() + b.keys()) }, sub, {}), map(lambda sub: list(sub[1]), itertools.groupby(mylist, lambda lst: lst['mid'])))

不那么令人讨厌:

import itertools
groups = map(lambda sub: list(sub[1]), itertools.groupby(mylist, lambda lst: lst['mid'])) # get the dicts organized into groups on key 'mid'

def joindicts(a,b):
    result = dict()
    for key in set(a.keys() + b.keys()): # get union of keys for both dicts
        val_a = str(a.get(key, ''))
        val_b = str(b.get(key, ''))
        val = ','.join([x for x in [val_a] + [val_b] if x != ''])
        result.update({key:val})
    return result

map(lambda sub: reduce(joindicts, sub, {}), groups)