优化代码以处理大量数据

时间:2018-09-02 06:12:01

标签: python

我有以下代码:

import json


data_sample = [{
"name":"John",
"age":30,
"cars":[ {
"temp":{
"sum":"20",
"for":12,
}
,
"id":30,
"element":[ {"model":"Taurus1", "doors":{"id":"1", "id2":101}}, {"model":"T1", "doors":{"id":"2", "id2":12}},  {"model":"As", "doors":{"id":"Mo", "id2":4}} ]
}, {
"temp":{
"sum":"10",
"for":12,
}
,
"id":31,
"element":[ {"model":"Taurus2", "doors":{"id":"2", "id2":102}}, {"model":"T2", "doors":{"id":"5", "id2":12}},  {"model":"Thing", "doors":{"id":"Fo", "id2":4}} ]
}, {
"temp":{
"sum":"20",
"for":10,
}
,
"id":32,
"element":[ {"model":"Taurus3", "doors":{"id":"3", "id2":103}}, {"model":"T3", "doors":{"id":"15", "id2":62}},  {"model":"By", "doors":{"id":"Log", "id2":4}} ]
} ]
}]

def flat_list(z):
    x = []
    for i, data_obj in enumerate(z):
        if type(data_obj) is dict or type(data_obj) is list:
            x.extend([flatten_data(data_obj)])
        else:
            x.extend([data_obj])
    return x


def flatten_data(y):
    out = {}
    def flatten(x, name=''):
            if type(x) is dict:
                for a in x:
                    flatten(x[a], name + a + '_')
            elif type(x) is list:
                out[name[:-1]] = flat_list(x)
            else:
                out[name[:-1]] = x
    flatten(y)
    return out

def generatejson(response2):

    # response 2 is [(first data set), (second data set)]  convert it to dictionary {0: (first data set), 1: (second data set)}
    sample_object = {i: data_response for i, data_response in enumerate(response2)}
    flat = {k: flatten_data(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

print generatejson(data_sample)

此代码从以下格式获取数据:

[(first data set), (second data set)]

并开始寻找嵌套字典。如果检测到嵌套字典,则代码会将其平整到父级。

例如,代码检测到了这一点:

enter image description here

doors是嵌套字典,因此将其转换为:

enter image description here

请注意,它不会更改列表/数组。它们没有被压扁。

我的问题:

在少量数据上,代码可以很好地工作,但是处理大量数据集(1000多个)时,性能却很低……甚至崩溃。

如何改善和优化此代码的性能?

data_sample仅包含1个数据集(我认为这足以进行检查)。

0 个答案:

没有答案