在数据列表中对数据进行分组

时间:2013-08-10 00:43:54

标签: python

我有一个如下所示的词典列表:

[{TYPE, OBJECT_ID, ACTOR, EXTRA_FIELDS}, ...]   

我想浏览并汇总{TYPE,OBJECT_ID}的副本,并将ACTOR列为一个列表:

从:

开始
   [ {'type': 'LOVE', 'obj_id': 1242, 'actor': 'bob', {..}}, 
      {'type': 'LOVE', 'obj_id': 1242, 'actor': 'dave', {..}}, 
      {'type': 'FAV', 'obj_id': 1242, 'actor': 'sam', {..}}, 
      {'type': 'LOVE', 'obj_id': 242, 'actor': 'bob', {..}}]

最终:

   [ {'type': 'LOVE', 'obj_id': 1242, 'actor': ['bob', 'dave'], {..}}, 
      {'type': 'FAV', 'obj_id': 1242, 'actor': ['sam'], {...}}, 
      {'type': 'LOVE', 'obj_id': 242, 'actor': ['bob'], {...}} ]

EXTRA_FIELDS不必合并,他们只能使用汇总的其中一个项目中的数据。

我怎么能在python中做到这一点?

5 个答案:

答案 0 :(得分:0)

假设input是元组列表(不是集合),那么

TYPE= 0
OBJECT_ID= 1
ACTOR= 2
EXTRA_INFO= 3
keys= set( [ ( e[TYPE] , e[OBJECT_ID] ) for e in input ] )
output= { k: [ ( e[ACTOR] , e[EXTRA_INFO] ) for e in input if ( e[TYPE] , e[OBJECT_ID] ) == k ] for k in keys }

或者,如果你喜欢单行:

output= { k: [ ( e[2] , e[3] ) for e in input if ( e[0] , e[1] ) == k ] for k in [ ( e[0] , e[1] ) for e in input ] }

假设input是词典列表,则变为:

keys= set( [ ( e['type'] , e['obj_id'] ) for e in input ] )
output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in keys }

或者,

output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in [ ( e['type'] , e['obj_id'] ) for e in input ] }

当然,您也可以手动编写这些理解所做的事情,但我不推荐它,除非数据量太大而您开始遇到需要低级优化的性能问题。

答案 1 :(得分:0)

您的列表我表示为alist

actors = {}
extra = {}
for x in alist:
   if actors.has_key([(x['type'],x['obj_id'])):
      actors[x['type'],x['obj_id']].append(x['actor'])
   else:
      actors[x['type'],x['obj_id']] = []
   extra[x['type'],x['obj_id']] = x['extra']

outlist = []
for k in actors.keys():
   x = {}
   x['type'], x['obj_id'], x['actor'], x['extra'] = k[0], k[1], actors[k], extra[k]
   outlist.append(x)

outlist是输出列表。

答案 2 :(得分:0)

您应该将问题分解为其组成部分。

您需要做的第一件事就是将所有这些演员更改为列表:

for dict in list_of_dicts:
    dict['actor'] = [dict['actor']]

然后你需要编写一个方法来检查特定对是否在dicts列表中,如果是,则返回索引:

def check_pair(list_of_dicts,type,obj_id):
    #return index of matching pair, None otherwise
    index = -1
    for dict in list_of_dicts:
    index += 1
        if dict['type'] == type and dict['obj_id'] == obj_id:
        return index
    else:
        return None

然后你需要创建一个新列表(存储新数据)并浏览旧列表,或者将它附加到新列表中,或者如果obj_id和type已经存在,则将actor附加到该dict。< / p>

new_list = []
for dict in list_of_dicts:
    j = check_pair(new_list,dict['type'],dict['obj_id'])
if j == None:
    new_list.append(dict)
else:
    new_list[j]['actor'].append(dict['actor'])

我应该指出,有一个像这样的词典列表是非常传统的东西,你应该真正找到一种方法来使你的数据结构更合理。

答案 3 :(得分:0)

我是这样做的:

def merge_dicts(list_of_dicts):
    lookup = {}
    results = []
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try: # it's easier to ask forgiveness than permission
            lookup[key]['actor'].append(d['actor'])
        except KeyError:
            val = {'type': d['type'],
                   'obj_id': d['obj_id'],
                   'actor': [d['actor']], # note, extra [] around value to make it a list
                   'extra_fields': d['extra_fields']}
            lookup[key] = val
            results.append(val)

    return results

lookup dict从键值的元组映射到结果列表中包含的字典。如果稍后遇到具有相同键的其他词典,那些输出词典将会发生actor值变异。

一个更自然的解决方案是删除字典列表数据结构,而是使用从type, obj_id个键映射到actors, extra_fields值的单个字典。这就是看起来的样子:

def merge_dicts2(list_of_dicts):
    results = {}
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try:
            results[key][0].append(d['actor'])
        except KeyError:
            results[key] = ([d['actor']], d['extra_fields'])

    return results

这包含您的词典列表中的大部分数据,只有订单已丢失(并且由于您要合并旧列表中的项目,因此无论如何都会丢失某些订单)。

如果您稍后要对集合进行迭代,这种方式会更容易,因为您可以在循环中解包元组(甚至是嵌套的元组):

combined_dict = merge_dicts(list_of_dicts)

for (type, obj_id), (actors, extra_fields) in combined_dict.items():
    # do stuff with type, obj_id, actors, extra_fields

答案 4 :(得分:-2)

一种解决方案是:首先,获取标识符集(一组唯一的类型组合和obj_id);然后,获取每个组合的演员列表。

identifiers = set((item['type'], item['obj_id']) for item in input_list)
output_list = []
for type, obj_id in identifiers:
    output_list.append({
        'type': type,
        'obj_id': obj_id,
        'actor': [item['actor'] for item in input_list
            if item['type'] is type and item['obj_id'] is obj_id]
        })

或者,使用元组作为字典键:

actors_dict = {}
for item in input_list:
    actors_dict.setdefault((item['type'], item['obj_id']), []).append(item['actor'])
output_list = [{'type': type, 'obj_id': obj_id, 'actor': actors}
    for (type, obj_id), actors in actors_dict.iteritems()]    

或者更灵活地编写此方法(例如,如果您要添加其他值以进行合并)将是:

output_dict = {}
for item in input_list:
    k = item['type'], item['obj_id']
    if k in output_dict:
        output_dict[k]['actor'].append(item['actor'])
    else:
        item['actor'] = [item['actor']]
        output_dict[k] = item
output_list = output_dict.values()

(请注意,最后一个也会改变输入列表。)