将循环写入pythonic方式

时间:2019-06-04 14:19:45

标签: python python-3.x

我有一个dict数组,用于存储票证审核。每次审核都有user_iddate(发生更改)和list of events的信息,每个事件都有一些属性,例如typefield name等。

基于这些信息,我需要基于date提取事件信息并将其转换为另一个字典。注意:我只需要保留每个field_name的最后一个事件。

我写了一个“ super”循环来满足我的需要,但是这段代码看起来很怪异,没有被优化:

dict示例:

data = {
    "audits": [
        "id": 1234,
            "ticket_id": 1111,
            "created_at": "2019-04-07T01:09:40Z",
            "author_id": 9876543,           
            "events": [{
                    "id": 1234,
                    "type": "Random"
                },
                {
                    "id": 765456,
                    "type": "Create",
                    "value": "Lovely form",
                    "field_name": "subject"
                },              
                {
                    "id": 356765,
                    "type": "Create",
                    "value": None,
                    "field_name": "priority"
                },              
                {
                    "id": 2345432,
                    "type": "Change",                   
                    "value": "normal",
                    "field_name": "priority",
                    "previous_value": None
                }
            ]
        }
    ]
}

代码:

field_history = []

for audit in data['audits']:
    user_id = audit['author_id']
    updated = audit['created_at']

    base_info = {
        'user_id': user_id,
        'updated': updated
    }

    # Iterate to get distinct value (last found on dict)
    fields = [d for d in audit['events'] if (d['type'] == 'Create' or d['type'] == 'Change') and d['field_name'] != 'tags']        
    updated_fields = [] # this list is being used to keep history by updated
    for field in fields:
        distincts = [d for d in audit['events'] if d.get('field_name', '') == field['field_name']]        
        distinct = distincts[-1]
        # remove older values and keep only the last one found on list
        updated_fields[:] = [d for d in updated_fields if d['updated'] == updated and d.get('field_name') != distinct['field_name']]
        updated_fields.append({**base_info, **distinct}) # add always the last element on list

    field_history = field_history + updated_fields

编写此循环以使其优化以处理大型数据集的正确方法是什么?

1 个答案:

答案 0 :(得分:1)

我喜欢通过制作一些简单的函数来处理转换和过滤,以使顶层保持干净:

def event_valid(event):
    return (
        event['type'] in ('Create', 'Change')
        and event['field_name'] not in ('tags',)
    )

events = [event for event in audit['events'] if event_valid(event)]

# Assuming the list is ordered... If not then sort it before next statement
# This trick filters to only the latest event for each distinct field_name
events = {
    event['field_name']: event for event in events
}.values()

return {
    'user_id': audit['author_id'],
    'updated': audit['created_at'],
    'events': events,
}