Question

我有一个程序用于计算距离，然后应用k-means算法。我在一个小清单上进行了测试，它的工作正常且速度很快，但是，我的原始列表非常大（> 5000），所以它需要永远，我最终终止了运行。我可以使用outer（）或任何其他并行函数并将其应用于距离函数以使其更快吗？在我的小集上：

strings = ['cosine cos', 'cosine', 'cosine???????', 'l1', 'l2', 'manhattan']

它的距离3D数组返回如下：

[[[ 0.          0.25        0.47826087  1.          1.          0.89473684]
  [ 0.25        0.          0.36842105  1.          1.          0.86666667]
  [ 0.47826087  0.36842105  0.          1.          1.          0.90909091]
  [ 1.          1.          1.          0.          0.5         1.        ]
  [ 1.          1.          1.          0.5         0.          1.        ]
  [ 0.89473684  0.86666667  0.90909091  1.          1.          0.        ]]]

上面数组的每一行代表字符串列表中一个项目的距离。我使用for循环的方法是：

strings = ['cosine cos', 'cosine', 'cosine???????', 'l1', 'l2', 'manhattan']


data1 = []


for j in range(len(np.array(list(strings)))):

     for i in range(len(strings)):
       data1.append(1-Levenshtein.ratio(np.array(list(strings))[j], np.array(list(strings))[i]))

#n =(map(Levenshtein.ratio, strings))
#n =(reduce(Levenshtein.ratio, strings))
#print(n)



k=len(strings)
data2=np.asarray(data1)
arr_3d = data2.reshape((1,k,k))
print(arr_3d)

arr_3d是上面的数组。如何使用outer（）或map（）中的任何一个来替换上面的for循环，因为当列表strings很大时，它需要花费数小时甚至没有得到结果。我很感激帮助。 Levenshtein.ratio是python内置的功能。

Answer 1

import numpy as np 

strings = ['cosine cos', 'cosine', 'cosine???????', 'l1', 'l2', 'manhattan']

k=len(strings)

data = np.zeros((k,k))

for i,string1 in enumerate(strings):
    for j,string2 in enumerate(strings):
        data[i][j] = 1-Levenshtein.ratio(string1, string2)

print data

这里没有获得map或reduce的收益，循环需要像@ user2357112所提到的那样运行，但是，这更清晰，应该运行得更快，因为它避免了{{1}你一直在使用。

使用reduce，map或其他函数来避免python中的for循环

1 个答案: