高效的余弦距离计算

时间:2014-06-30 17:10:55

标签: python numpy

我想从矩阵的行计算向量的最近余弦邻居,并且已经测试了一些Python函数的性能来执行此操作。

def cos_loop_spatial(matrix, vector):
    """
    Calculating pairwise cosine distance using a common for loop with the numpy cosine function.
    """
    neighbors = []
    for row in range(matrix.shape[0]):
        neighbors.append(scipy.spatial.distance.cosine(vector, matrix[row,:]))
    return neighbors

def cos_loop(matrix, vector):
    """
    Calculating pairwise cosine distance using a common for loop with manually calculated cosine value.
    """
    neighbors = []
    for row in range(matrix.shape[0]):
        vector_norm = np.linalg.norm(vector)
        row_norm = np.linalg.norm(matrix[row,:])
        cos_val = vector.dot(matrix[row,:]) / (vector_norm * row_norm)
        neighbors.append(cos_val)
    return neighbors

def cos_matrix_multiplication(matrix, vector):
    """
    Calculating pairwise cosine distance using matrix vector multiplication.
    """
    dotted = matrix.dot(vector)
    matrix_norms = np.linalg.norm(matrix, axis=1)
    vector_norm = np.linalg.norm(vector)
    matrix_vector_norms = np.multiply(matrix_norms, vector_norm)
    neighbors = np.divide(dotted, matrix_vector_norms)
    return neighbors

cos_functions = [cos_loop_spatial, cos_loop, cos_matrix_multiplication]

# Test performance and plot the best results of each function
mat = np.random.randn(1000,1000)
vec = np.random.randn(1000)
cos_performance = {}
for func in cos_functions:
    func_performance = %timeit -o func(mat, vec)
    cos_performance[func.__name__] = func_performance.best

pd.Series(cos_performance).plot(kind='bar')

result

cos_matrix_multiplication功能显然是其中最快的功能,但我想知道您是否有进一步提高矩阵矢量余弦距离计算效率的建议。

1 个答案:

答案 0 :(得分:2)

使用scipy.spatial.distance.cdist(mat, vec[np.newaxis,:], metric='cosine'),基本上计算两个向量集合的每对之间的成对距离,由两个输入矩阵的行表示。