我有一个如下所示的词典列表:
[{TYPE, OBJECT_ID, ACTOR, EXTRA_FIELDS}, ...]
我想浏览并汇总{TYPE,OBJECT_ID}的副本,并将ACTOR列为一个列表:
从:
开始 [ {'type': 'LOVE', 'obj_id': 1242, 'actor': 'bob', {..}},
{'type': 'LOVE', 'obj_id': 1242, 'actor': 'dave', {..}},
{'type': 'FAV', 'obj_id': 1242, 'actor': 'sam', {..}},
{'type': 'LOVE', 'obj_id': 242, 'actor': 'bob', {..}}]
最终:
[ {'type': 'LOVE', 'obj_id': 1242, 'actor': ['bob', 'dave'], {..}},
{'type': 'FAV', 'obj_id': 1242, 'actor': ['sam'], {...}},
{'type': 'LOVE', 'obj_id': 242, 'actor': ['bob'], {...}} ]
EXTRA_FIELDS不必合并,他们只能使用汇总的其中一个项目中的数据。
我怎么能在python中做到这一点?
答案 0 :(得分:0)
假设input
是元组列表(不是集合),那么
TYPE= 0
OBJECT_ID= 1
ACTOR= 2
EXTRA_INFO= 3
keys= set( [ ( e[TYPE] , e[OBJECT_ID] ) for e in input ] )
output= { k: [ ( e[ACTOR] , e[EXTRA_INFO] ) for e in input if ( e[TYPE] , e[OBJECT_ID] ) == k ] for k in keys }
或者,如果你喜欢单行:
output= { k: [ ( e[2] , e[3] ) for e in input if ( e[0] , e[1] ) == k ] for k in [ ( e[0] , e[1] ) for e in input ] }
假设input
是词典列表,则变为:
keys= set( [ ( e['type'] , e['obj_id'] ) for e in input ] )
output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in keys }
或者,
output= { k: [ { 'actor':e['actor'] , 'extra_info':e['extra_info'] } for e in input if ( e['type'] , e['obj_id'] ) == k ] for k in [ ( e['type'] , e['obj_id'] ) for e in input ] }
当然,您也可以手动编写这些理解所做的事情,但我不推荐它,除非数据量太大而您开始遇到需要低级优化的性能问题。
答案 1 :(得分:0)
您的列表我表示为alist
。
actors = {}
extra = {}
for x in alist:
if actors.has_key([(x['type'],x['obj_id'])):
actors[x['type'],x['obj_id']].append(x['actor'])
else:
actors[x['type'],x['obj_id']] = []
extra[x['type'],x['obj_id']] = x['extra']
outlist = []
for k in actors.keys():
x = {}
x['type'], x['obj_id'], x['actor'], x['extra'] = k[0], k[1], actors[k], extra[k]
outlist.append(x)
outlist
是输出列表。
答案 2 :(得分:0)
您应该将问题分解为其组成部分。
您需要做的第一件事就是将所有这些演员更改为列表:
for dict in list_of_dicts:
dict['actor'] = [dict['actor']]
然后你需要编写一个方法来检查特定对是否在dicts列表中,如果是,则返回索引:
def check_pair(list_of_dicts,type,obj_id):
#return index of matching pair, None otherwise
index = -1
for dict in list_of_dicts:
index += 1
if dict['type'] == type and dict['obj_id'] == obj_id:
return index
else:
return None
然后你需要创建一个新列表(存储新数据)并浏览旧列表,或者将它附加到新列表中,或者如果obj_id和type已经存在,则将actor附加到该dict。< / p>
new_list = []
for dict in list_of_dicts:
j = check_pair(new_list,dict['type'],dict['obj_id'])
if j == None:
new_list.append(dict)
else:
new_list[j]['actor'].append(dict['actor'])
我应该指出,有一个像这样的词典列表是非常传统的东西,你应该真正找到一种方法来使你的数据结构更合理。
答案 3 :(得分:0)
我是这样做的:
def merge_dicts(list_of_dicts):
lookup = {}
results = []
for d in list_of_dicts:
key = (d['type'], d['obj_id'])
try: # it's easier to ask forgiveness than permission
lookup[key]['actor'].append(d['actor'])
except KeyError:
val = {'type': d['type'],
'obj_id': d['obj_id'],
'actor': [d['actor']], # note, extra [] around value to make it a list
'extra_fields': d['extra_fields']}
lookup[key] = val
results.append(val)
return results
lookup
dict从键值的元组映射到结果列表中包含的字典。如果稍后遇到具有相同键的其他词典,那些输出词典将会发生actor
值变异。
一个更自然的解决方案是删除字典列表数据结构,而是使用从type, obj_id
个键映射到actors, extra_fields
值的单个字典。这就是看起来的样子:
def merge_dicts2(list_of_dicts):
results = {}
for d in list_of_dicts:
key = (d['type'], d['obj_id'])
try:
results[key][0].append(d['actor'])
except KeyError:
results[key] = ([d['actor']], d['extra_fields'])
return results
这包含您的词典列表中的大部分数据,只有订单已丢失(并且由于您要合并旧列表中的项目,因此无论如何都会丢失某些订单)。
如果您稍后要对集合进行迭代,这种方式会更容易,因为您可以在循环中解包元组(甚至是嵌套的元组):
combined_dict = merge_dicts(list_of_dicts)
for (type, obj_id), (actors, extra_fields) in combined_dict.items():
# do stuff with type, obj_id, actors, extra_fields
答案 4 :(得分:-2)
一种解决方案是:首先,获取标识符集(一组唯一的类型组合和obj_id);然后,获取每个组合的演员列表。
identifiers = set((item['type'], item['obj_id']) for item in input_list)
output_list = []
for type, obj_id in identifiers:
output_list.append({
'type': type,
'obj_id': obj_id,
'actor': [item['actor'] for item in input_list
if item['type'] is type and item['obj_id'] is obj_id]
})
或者,使用元组作为字典键:
actors_dict = {}
for item in input_list:
actors_dict.setdefault((item['type'], item['obj_id']), []).append(item['actor'])
output_list = [{'type': type, 'obj_id': obj_id, 'actor': actors}
for (type, obj_id), actors in actors_dict.iteritems()]
或者更灵活地编写此方法(例如,如果您要添加其他值以进行合并)将是:
output_dict = {}
for item in input_list:
k = item['type'], item['obj_id']
if k in output_dict:
output_dict[k]['actor'].append(item['actor'])
else:
item['actor'] = [item['actor']]
output_dict[k] = item
output_list = output_dict.values()
(请注意,最后一个也会改变输入列表。)