我有一个代码来计算两个矩阵之间的余弦相似度:
def cos_cdist_1(matrix, vector):
v = vector.reshape(1, -1)
return sp.distance.cdist(matrix, v, 'cosine').reshape(-1)
def cos_cdist_2(matrix1, matrix2):
return sp.distance.cdist(matrix1, matrix2, 'cosine').reshape(-1)
list1 = [[1,1,1],[1,2,1]]
list2 = [[1,1,1],[1,2,1]]
matrix1 = np.asarray(list1)
matrix2 = np.asarray(list2)
results = []
for vector in matrix2:
distance = cos_cdist_1(matrix1,vector)
distance = np.asarray(distance)
similarity = (1-distance).tolist()
results.append(similarity)
dist_all = cos_cdist_2(matrix1, matrix2)
results2 = []
for item in dist_all:
distance_result = np.asarray(item)
similarity_result = (1-distance_result).tolist()
results2.append(similarity_result)
results
是
[[1.0000000000000002, 0.9428090415820635],
[0.9428090415820635, 1.0000000000000002]]
但是,results2
为[1.0000000000000002, 0.9428090415820635, 0.9428090415820635, 1.0000000000000002]
我理想的结果是results
,这意味着结果包含相似值列表,但我想保持两个矩阵之间的计算而不是矢量和矩阵,任何好主意?
答案 0 :(得分:10)
In [75]: import scipy.spatial as sp
In [76]: 1 - sp.distance.cdist(matrix1, matrix2, 'cosine')
Out[76]:
array([[ 1. , 0.94280904],
[ 0.94280904, 1. ]])
因此,您可以删除for-loops
并将其全部替换为
results2 = 1 - sp.distance.cdist(matrix1, matrix2, 'cosine')
答案 1 :(得分:0)
您可以看看scikit Learn的用于计算余弦相似度的API:https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html。
余弦相似度或余弦内核计算相似度为 X和Y的归一化点积:
K(X,Y)=
/(|| X || * || Y ||) X :darray或稀疏数组,形状:(n_samples_X,n_features)
Y :darray或稀疏数组,形状:(n_samples_Y,n_features) 如果为None,则输出将是所有之间的成对相似性 X中的样本。