我有一个数据框
import numpy as np
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import pandas as pd
a = {'b':['cat','bat','cat','cat','bat','No Data','bat','No Data']}
df11 = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])
我有一个距离函数
def distancemetric(x):
list1 = x['b'].tolist()
result11 =[]
sortlist11 = [process.extract(ele, list1, limit=11000000, scorer=fuzz.token_set_ratio) for ele in list1]
d11 = [dict(element) for element in sortlist11]
finale11 = [(k, element123[k]) for k in list1 for element123 in d11]
result11.extend([x[1] for x in finale11])
final_result11=np.reshape(result11, (len(x.index),len(x.index)))
return final_result11
我将功能称为
values1 = distancemetric(df11)
此处token_set_ratio
方法仅比较两个字符串。当我传递字符串数组时,它会给我我不需要的平均值。
此代码有效,但速度较慢。有什么方法可以使其运行更快