在大量的python字典中快速映射/修改值?

时间:2018-08-24 13:09:38

标签: python python-3.x performance

我有一些要加快速度的代码。也许我所知道的是对的,但是每当我在StackOverflow上提问时,通常有人会知道一个聪明的小窍门“使用地图!”,“尝试此lambda”或“导入iteratetools”,我希望有人可以在这里提供帮助。这是我关注的代码部分:

#slowest part from here....
for row_dict in json_data:
    row_dict_clean = {}
    for key, value in row_dict.items():
        value_clean = get_cleantext(value)
        row_dict_clean[key] = value_clean
    json_data_clean.append(row_dict_clean)
    total += 1
#to here...

这个概念很简单。我有一个包含字典的数百万个长的list,我需要通过一个更清洁的工具来运行每个value。然后,我得到了一个完整的干净字典列表。我不知道该使用什么聪明的iterate工具?这是一个更完整的MVE,可以帮助您玩它:

def get_json_data_clean(json_data):
    json_data_clean = []
    total = 0
    #slowest part from here....
    for row_dict in json_data:
        row_dict_clean = {}
        for key, value in row_dict.items():
            value_clean = get_cleantext(value)
            row_dict_clean[key] = value_clean
        json_data_clean.append(row_dict_clean)
        total += 1
    #to here...
    return json_data_clean

def get_cleantext(value):
    #do complex cleaning stuffs on the string, I can't change what this does
    value = value.replace("bad", "good")
    return value

json_data = [
    {"key1":"some bad",
     "key2":"bad things",
     "key3":"extra bad"},
    {"key1":"more bad stuff",
     "key2":"wow, so much bad",
     "key3":"who dis?"},
    # a few million more dictionaries
    {"key1":"so much bad stuff",
     "key2":"the bad",
     "key3":"the more bad"},
]

json_data_clean = get_json_data_clean(json_data)
print(json_data_clean)

每当我在头上嵌套一些小铃铛以进行循环时,可能会有更好的方法。任何帮助表示赞赏!

1 个答案:

答案 0 :(得分:2)

一定要问https://codereview.stackexchange.com/的聪明人,但是作为快速解决方案,您似乎可以map()通过以下字典列表来进行转换:

def clean_text(value: str)-> str:
    # ...
    return value.replace("bad", "good")

def clean_dict(d: dict):
    return {k:clean_text(v) for k,v in d.items()}


json_data = [
    {"key1":"some bad",
     "key2":"bad things",
     "key3":"extra bad"},
    {"key1":"more bad stuff",
     "key2":"wow, so much bad",
     "key3":"who dis?"},
    # a few million more dictionaries
    {"key1":"so much bad stuff",
     "key2":"the bad",
     "key3":"the more bad"},
]

x = list(map(clean_dict, json_data))

遗漏的是您的total计数器,但它似乎永远不会离开get_json_data_clean()

不知道为什么@Daniel Gale提出了filter(),因为您没有遍历任何值,而只是转换它们。