我有一些要加快速度的代码。也许我所知道的是对的,但是每当我在StackOverflow上提问时,通常有人会知道一个聪明的小窍门“使用地图!”,“尝试此lambda”或“导入iteratetools”,我希望有人可以在这里提供帮助。这是我关注的代码部分:
#slowest part from here....
for row_dict in json_data:
row_dict_clean = {}
for key, value in row_dict.items():
value_clean = get_cleantext(value)
row_dict_clean[key] = value_clean
json_data_clean.append(row_dict_clean)
total += 1
#to here...
这个概念很简单。我有一个包含字典的数百万个长的list
,我需要通过一个更清洁的工具来运行每个value
。然后,我得到了一个完整的干净字典列表。我不知道该使用什么聪明的iterate
工具?这是一个更完整的MVE,可以帮助您玩它:
def get_json_data_clean(json_data):
json_data_clean = []
total = 0
#slowest part from here....
for row_dict in json_data:
row_dict_clean = {}
for key, value in row_dict.items():
value_clean = get_cleantext(value)
row_dict_clean[key] = value_clean
json_data_clean.append(row_dict_clean)
total += 1
#to here...
return json_data_clean
def get_cleantext(value):
#do complex cleaning stuffs on the string, I can't change what this does
value = value.replace("bad", "good")
return value
json_data = [
{"key1":"some bad",
"key2":"bad things",
"key3":"extra bad"},
{"key1":"more bad stuff",
"key2":"wow, so much bad",
"key3":"who dis?"},
# a few million more dictionaries
{"key1":"so much bad stuff",
"key2":"the bad",
"key3":"the more bad"},
]
json_data_clean = get_json_data_clean(json_data)
print(json_data_clean)
每当我在头上嵌套一些小铃铛以进行循环时,可能会有更好的方法。任何帮助表示赞赏!
答案 0 :(得分:2)
一定要问https://codereview.stackexchange.com/的聪明人,但是作为快速解决方案,您似乎可以map()
通过以下字典列表来进行转换:
def clean_text(value: str)-> str:
# ...
return value.replace("bad", "good")
def clean_dict(d: dict):
return {k:clean_text(v) for k,v in d.items()}
json_data = [
{"key1":"some bad",
"key2":"bad things",
"key3":"extra bad"},
{"key1":"more bad stuff",
"key2":"wow, so much bad",
"key3":"who dis?"},
# a few million more dictionaries
{"key1":"so much bad stuff",
"key2":"the bad",
"key3":"the more bad"},
]
x = list(map(clean_dict, json_data))
遗漏的是您的total
计数器,但它似乎永远不会离开get_json_data_clean()
。
不知道为什么@Daniel Gale提出了filter()
,因为您没有遍历任何值,而只是转换它们。