Question

我有一个如下所示的词典列表：

[{TYPE, OBJECT_ID, ACTOR, EXTRA_FIELDS}, ...]

我想浏览并汇总{TYPE，OBJECT_ID}的副本，并将ACTOR列为一个列表：

从：

开始

   [ {'type': 'LOVE', 'obj_id': 1242, 'actor': 'bob', {..}}, 
      {'type': 'LOVE', 'obj_id': 1242, 'actor': 'dave', {..}}, 
      {'type': 'FAV', 'obj_id': 1242, 'actor': 'sam', {..}}, 
      {'type': 'LOVE', 'obj_id': 242, 'actor': 'bob', {..}}]

最终：

   [ {'type': 'LOVE', 'obj_id': 1242, 'actor': ['bob', 'dave'], {..}}, 
      {'type': 'FAV', 'obj_id': 1242, 'actor': ['sam'], {...}}, 
      {'type': 'LOVE', 'obj_id': 242, 'actor': ['bob'], {...}} ]

EXTRA_FIELDS不必合并，他们只能使用汇总的其中一个项目中的数据。

我怎么能在python中做到这一点？

Answer 1

假设input是元组列表（不是集合），那么

TYPE= 0
OBJECT_ID= 1
ACTOR= 2
EXTRA_INFO= 3
keys= set( [ ( e[TYPE] , e[OBJECT_ID] ) for e in input ] )
output= { k: [ ( e[ACTOR] , e[EXTRA_INFO] ) for e in input if ( e[TYPE] , e[OBJECT_ID] ) == k ] for k in keys }

或者，如果你喜欢单行：

output= { k: [ ( e[2] , e[3] ) for e in input if ( e[0] , e[1] ) == k ] for k in [ ( e[0] , e[1] ) for e in input ] }

假设input是词典列表，则变为：

keys= set( [ ( e['type'] , e['obj_id'] ) for e in input ] )
output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in keys }

或者，

output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in [ ( e['type'] , e['obj_id'] ) for e in input ] }

当然，您也可以手动编写这些理解所做的事情，但我不推荐它，除非数据量太大而您开始遇到需要低级优化的性能问题。

Answer 2

您的列表我表示为alist。

actors = {}
extra = {}
for x in alist:
   if actors.has_key([(x['type'],x['obj_id'])):
      actors[x['type'],x['obj_id']].append(x['actor'])
   else:
      actors[x['type'],x['obj_id']] = []
   extra[x['type'],x['obj_id']] = x['extra']

outlist = []
for k in actors.keys():
   x = {}
   x['type'], x['obj_id'], x['actor'], x['extra'] = k[0], k[1], actors[k], extra[k]
   outlist.append(x)

outlist是输出列表。

Answer 3

您应该将问题分解为其组成部分。

您需要做的第一件事就是将所有这些演员更改为列表：

for dict in list_of_dicts:
    dict['actor'] = [dict['actor']]

然后你需要编写一个方法来检查特定对是否在dicts列表中，如果是，则返回索引：

def check_pair(list_of_dicts,type,obj_id):
    #return index of matching pair, None otherwise
    index = -1
    for dict in list_of_dicts:
    index += 1
        if dict['type'] == type and dict['obj_id'] == obj_id:
        return index
    else:
        return None

然后你需要创建一个新列表（存储新数据）并浏览旧列表，或者将它附加到新列表中，或者如果obj_id和type已经存在，则将actor附加到该dict。< / p>

new_list = []
for dict in list_of_dicts:
    j = check_pair(new_list,dict['type'],dict['obj_id'])
if j == None:
    new_list.append(dict)
else:
    new_list[j]['actor'].append(dict['actor'])

我应该指出，有一个像这样的词典列表是非常传统的东西，你应该真正找到一种方法来使你的数据结构更合理。

Answer 4

我是这样做的：

def merge_dicts(list_of_dicts):
    lookup = {}
    results = []
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try: # it's easier to ask forgiveness than permission
            lookup[key]['actor'].append(d['actor'])
        except KeyError:
            val = {'type': d['type'],
                   'obj_id': d['obj_id'],
                   'actor': [d['actor']], # note, extra [] around value to make it a list
                   'extra_fields': d['extra_fields']}
            lookup[key] = val
            results.append(val)

    return results

lookup dict从键值的元组映射到结果列表中包含的字典。如果稍后遇到具有相同键的其他词典，那些输出词典将会发生actor值变异。

一个更自然的解决方案是删除字典列表数据结构，而是使用从type, obj_id个键映射到actors, extra_fields值的单个字典。这就是看起来的样子：

def merge_dicts2(list_of_dicts):
    results = {}
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try:
            results[key][0].append(d['actor'])
        except KeyError:
            results[key] = ([d['actor']], d['extra_fields'])

    return results

这包含您的词典列表中的大部分数据，只有订单已丢失（并且由于您要合并旧列表中的项目，因此无论如何都会丢失某些订单）。

如果您稍后要对集合进行迭代，这种方式会更容易，因为您可以在循环中解包元组（甚至是嵌套的元组）：

combined_dict = merge_dicts(list_of_dicts)

for (type, obj_id), (actors, extra_fields) in combined_dict.items():
    # do stuff with type, obj_id, actors, extra_fields

Answer 5

一种解决方案是：首先，获取标识符集（一组唯一的类型组合和obj_id）;然后，获取每个组合的演员列表。

identifiers = set((item['type'], item['obj_id']) for item in input_list)
output_list = []
for type, obj_id in identifiers:
    output_list.append({
        'type': type,
        'obj_id': obj_id,
        'actor': [item['actor'] for item in input_list
            if item['type'] is type and item['obj_id'] is obj_id]
        })

或者，使用元组作为字典键：

actors_dict = {}
for item in input_list:
    actors_dict.setdefault((item['type'], item['obj_id']), []).append(item['actor'])
output_list = [{'type': type, 'obj_id': obj_id, 'actor': actors}
    for (type, obj_id), actors in actors_dict.iteritems()]

或者更灵活地编写此方法（例如，如果您要添加其他值以进行合并）将是：

output_dict = {}
for item in input_list:
    k = item['type'], item['obj_id']
    if k in output_dict:
        output_dict[k]['actor'].append(item['actor'])
    else:
        item['actor'] = [item['actor']]
        output_dict[k] = item
output_list = output_dict.values()

（请注意，最后一个也会改变输入列表。）

在数据列表中对数据进行分组

5 个答案: