我有一个dict数组,用于存储票证审核。每次审核都有user_id
,date
(发生更改)和list of events
的信息,每个事件都有一些属性,例如type
,field name
等。
基于这些信息,我需要基于date
提取事件信息并将其转换为另一个字典。注意:我只需要保留每个field_name
的最后一个事件。
我写了一个“ super”循环来满足我的需要,但是这段代码看起来很怪异,没有被优化:
dict示例:
data = {
"audits": [
"id": 1234,
"ticket_id": 1111,
"created_at": "2019-04-07T01:09:40Z",
"author_id": 9876543,
"events": [{
"id": 1234,
"type": "Random"
},
{
"id": 765456,
"type": "Create",
"value": "Lovely form",
"field_name": "subject"
},
{
"id": 356765,
"type": "Create",
"value": None,
"field_name": "priority"
},
{
"id": 2345432,
"type": "Change",
"value": "normal",
"field_name": "priority",
"previous_value": None
}
]
}
]
}
代码:
field_history = []
for audit in data['audits']:
user_id = audit['author_id']
updated = audit['created_at']
base_info = {
'user_id': user_id,
'updated': updated
}
# Iterate to get distinct value (last found on dict)
fields = [d for d in audit['events'] if (d['type'] == 'Create' or d['type'] == 'Change') and d['field_name'] != 'tags']
updated_fields = [] # this list is being used to keep history by updated
for field in fields:
distincts = [d for d in audit['events'] if d.get('field_name', '') == field['field_name']]
distinct = distincts[-1]
# remove older values and keep only the last one found on list
updated_fields[:] = [d for d in updated_fields if d['updated'] == updated and d.get('field_name') != distinct['field_name']]
updated_fields.append({**base_info, **distinct}) # add always the last element on list
field_history = field_history + updated_fields
编写此循环以使其优化以处理大型数据集的正确方法是什么?
答案 0 :(得分:1)
我喜欢通过制作一些简单的函数来处理转换和过滤,以使顶层保持干净:
def event_valid(event):
return (
event['type'] in ('Create', 'Change')
and event['field_name'] not in ('tags',)
)
events = [event for event in audit['events'] if event_valid(event)]
# Assuming the list is ordered... If not then sort it before next statement
# This trick filters to only the latest event for each distinct field_name
events = {
event['field_name']: event for event in events
}.values()
return {
'user_id': audit['author_id'],
'updated': audit['created_at'],
'events': events,
}