我试图为两个向量实现余弦相似性,但我遇到了一个特殊情况,其中两个向量只有一个组件,如下所示:
v1 = [3]
v2 = [4]
以下是我对余弦相似度的实现:
def dotProduct(v1, v2):
if len(v1) != len(v2):
return 0
return sum([x * y for x, y in zip(v1, v2)])
def cosineSim(v1, v2):
dp = dotProduct(v1, v2)
mag1 = math.sqrt(dotProduct(v1, v1))
mag2 = math.sqrt(dotProduct(v2, v2))
return dp / (mag1 * mag2)
任何两个只有一个分量的矢量的余弦相似性总是1。有人可以指导我如何处理这种特殊情况吗?谢谢。
答案 0 :(得分:0)
The correct answer here is to use numpy. As @COLDSPEED said, use numpy vectors use them to perform your operation. The most succinct way to do this is with scipy's cosine distance function:
from scipy.spatial.distance import cosine
cosine_similarity = 1 - cosine(v1, v2)
# Or...
cosine_distance = cosine(v1, v2)
Or using raw numpy arrays, you can do it yourself:
import numpy as np
v1 = np.array(v1)
v2 = np.array(v2)
cosine_similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
If you must re-implement the wheel for some reason, your solution would probably be another if
case:
def dotProduct(v1, v2):
if len(v1) != len(v2):
return 0
if len(v1) == 1: # You only need to check one, since they're the same
return 1
return sum([x * y for x, y in zip(v1, v2)])
答案 1 :(得分:-1)
试试这个代码片段:
if a*b == 0:
return 0
if a*b < 0:
return -1
return 1