特殊向量的余弦相似度(仅一个组件)

时间:2018-03-09 18:53:35

标签: python python-3.x tf-idf cosine-similarity

我试图为两个向量实现余弦相似性,但我遇到了一个特殊情况,其中两个向量只有一个组件,如下所示:

v1 = [3] 
v2 = [4]

以下是我对余弦相似度的实现:

def dotProduct(v1, v2):
    if len(v1) != len(v2):
        return 0
    return sum([x * y for x, y in zip(v1, v2)])

def cosineSim(v1, v2):
    dp = dotProduct(v1, v2)
    mag1 = math.sqrt(dotProduct(v1, v1))
    mag2 = math.sqrt(dotProduct(v2, v2))
    return dp / (mag1 * mag2)

任何两个只有一个分量的矢量的余弦相似性总是1。有人可以指导我如何处理这种特殊情况吗?谢谢。

2 个答案:

答案 0 :(得分:0)

The correct answer here is to use numpy. As @COLDSPEED said, use numpy vectors use them to perform your operation. The most succinct way to do this is with scipy's cosine distance function:

from scipy.spatial.distance import cosine

cosine_similarity = 1 - cosine(v1, v2)
# Or...
cosine_distance = cosine(v1, v2)

Or using raw numpy arrays, you can do it yourself:

import numpy as np

v1 = np.array(v1)
v2 = np.array(v2)
cosine_similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

If you must re-implement the wheel for some reason, your solution would probably be another if case:

def dotProduct(v1, v2):
    if len(v1) != len(v2):
        return 0
    if len(v1) == 1:  # You only need to check one, since they're the same
        return 1
    return sum([x * y for x, y in zip(v1, v2)])

答案 1 :(得分:-1)

试试这个代码片段:

if a*b == 0:
  return 0
if a*b < 0:
  return -1
return 1