分组字典和聚合值数据的python列表

时间:2018-04-10 10:54:21

标签: python dictionary grouping

我有输入列表

inlist = [{"id":123,"hour":5,"groups":"1"},{"id":345,"hour":3,"groups":"1;2"},{"id":65,"hour":-2,"groups":"3"}]

我需要按“群组”值对词典进行分组。之后,我需要在新的分组列表中添加key min和max of hour。输出应该如下所示

outlist=[(1, [{"id":123, "hour":5, "min_group_hour":3, "max_group_hour":5}, {"id":345, "hour":3, "min_group_hour":3, "max_group_hour":5}]),
     (2, [{"id":345, "hour":3, "min_group_hour":3, "max_group_hour":3}])
     (3, [{"id":65, "hour":-2, "min_group_hour":-2, "max_group_hour":-2}])]

到目前为止,我设法将输入列表分组

new_list = []
for domain in test:
    for group in domain['groups'].split(';'):
        d = dict()
        d['id'] = domain['id']
        d['group'] = group
        d['hour'] = domain['hour']
        new_list.append(d)

for k,v in itertools.groupby(new_list, key=itemgetter('group')):
    print (int(k),max(list(v),key=itemgetter('hour'))

输出

('1', [{'group': '1', 'id': 123, 'hour': 5}])
('2', [{'group': '2', 'id': 345, 'hour': 3}])
('3', [{'group': '3', 'id': 65, 'hour': -2}])

我不知道如何按组聚合值?是否有更多的pythonic方法按需要拆分的键值对字典进行分组?

2 个答案:

答案 0 :(得分:2)

首先创建一个将组号映射到词典的词典:

social_uid

这给了我们一个看起来像

的字典
from collections import defaultdict

dicts_by_group = defaultdict(list)
for dic in inlist:
    groups = map(int, dic['groups'].split(';'))
    for group in groups:
        dicts_by_group[group].append(dic)

然后迭代分组的dicts并为每个组设置{1: [{'id': 123, 'hour': 5, 'groups': '1'}, {'id': 345, 'hour': 3, 'groups': '1;2'}], 2: [{'id': 345, 'hour': 3, 'groups': '1;2'}], 3: [{'id': 65, 'hour': -2, 'groups': '3'}]} min_group_hour

max_group_hour

结果:

outlist = []
for group in sorted(dicts_by_group.keys()):
    dicts = dicts_by_group[group]
    min_hour = min(dic['hour'] for dic in dicts)
    max_hour = max(dic['hour'] for dic in dicts)

    dicts = [{'id': dic['id'], 'hour': dic['hour'], 'min_group_hour': min_hour,
              'max_group_hour': max_hour} for dic in dicts]
    outlist.append((group, dicts))

答案 1 :(得分:1)

IIUC:这是另一种在pandas中执行此操作的方式:

import pandas as pd

input = [{"id":123,"hour":5,"group":"1"},{"id":345,"hour":3,"group":"1;2"},{"id":65,"hour":-2,"group":"3"}]
df = pd.DataFrame(input)
#Get minimum
dfmi = df.groupby('group').apply(min)
#Rename hour column as min_hour
dfmi.rename(columns={'hour':'min_hour'}, inplace=True)
dfmx = df.groupby('group').apply(max)
#Rename hour column as max_hour
dfmx.rename(columns={'hour':'max_hour'}, inplace=True)
#Merge min df with main df
df = df.merge(dfmi, on='group', how='outer')
#Merge max df with main df
df = df.merge(dfmx, on='group', how='outer')
output = list(df.apply(lambda x: x.to_dict(), axis=1))
#Dictionary of dictionaries
dict_out = df.to_dict(orient='index')