我有两本词典
user_hash = {
"as34": "98354897394053452345",
"ad23" : "2131313111313131313",
"ae23": "31245512121521212121"
}
active_user_hash = [
{"field0": "231634684712313"},
{"field0" : "23145454564120"},
{"field0" : "215465464133313"}]
实际上,这些词典和词典列表中有数百万个键值对。目的是循环遍历字典1的每个值,并使用自定义函数将其与第二个字典值列表进行比较。我不能使用任何类型的逻辑排序或优化,因为每个元素到元素的比较是必要的。哪种方法最快?
电流环需要11分钟!我想把它减少到几秒钟。
for index, id_hash in user_hash.iteritems():
try:
for element in active_user_hash:
match = custom_comparison_function_algo(id_hash, element['field0'])
if match < 40:
print 'success'
except Exception as err:
print err
import distance
def custom_comparison_function_algo(hash1, hash2):
levenshtein_dist = distance.nlevenshtein(hash1, hash2, method=1)
jaccard_dist = distance.jaccard(hash1, hash2)
return int(((levenshtein_dist + jaccard_dist) / 2) * 100)
我尝试了numpy矢量化,但无法绕过它。
答案 0 :(得分:0)
如果您从active_user_hash词典的值创建列表(或Ev.Kounis提出的集合),然后在列表解析中运行您的函数,该怎么办?
search_in = [ x.values()[0] for x in active_user_hash ]
res = [ x in search_in for x in user_hash.values() ]