XB必须是cdist中的二维数组

时间:2014-04-20 19:12:06

标签: python arrays euclidean-distance

我有两个3D数组

clusters = [array([[ 0.42199652, -0.14364404,  0.21290469]]), 
   array([[  5.80084178e-05,   1.20779787e-02,  -2.65970238e-02],
   [ -1.36810406e-02,   6.85722519e-02,  -2.60280724e-01],
   [  3.03098198e-02,   1.50170659e-02,  -1.09683402e-01],
   [ -1.50776089e-03,   7.22369575e-03,  -3.71181228e-02],
   [ -3.04448275e-01,  -3.66987035e-01,   1.44618682e-01],
   [  1.16567762e-03,   1.72858807e-02,  -9.39297514e-02],
   [  1.25896836e-04,   1.61310167e-02,  -6.00253128e-02],
   [  1.65062798e-02,   1.96933143e-02,  -4.26540031e-02],
   [ -3.78020965e-03,   7.51770012e-03,  -3.67852984e-02]]), 
   array([[-0.14674492,  0.34711217,  0.30955027]])

out_list = [[ 0.01650628  0.01969331 -0.042654  ]
   [-0.00150776  0.0072237  -0.03711812]
   [ 0.0001259   0.01613102 -0.06002531]]

我必须找出out_listclusters的每一行与intra_dist = [scipy.spatial.distance.cdist(clusters[i],out_list[i], 'euclidean') for i in xrange(num_clusters)] 中每个数组之间的欧几里德距离 我有一些代码

ValueError: XB must be a 2-dimensional array.

但它给了我{{1}} 对此有什么解决方案吗?

2 个答案:

答案 0 :(得分:0)

您需要编写embedding1,embedding2而没有索引位置

根据余弦相似度为每个查询语句找到最接近的语料库5个句子

closest_n = 5
for query, query_embedding in zip(headlines, headline_embeddings):
    distances = scipy.spatial.distance.cdist(embedding1, embedding2, "cosine")[0]

    results = zip(range(len(distances)), distances)
    results = sorted(results, key=lambda x: x[1])

答案 1 :(得分:-1)

您的out_listnumpy列表的1维数组。请注意,值out_list之间没有逗号,但在群集中,值之间有逗号。为了使其工作,有必要在二维列表中转换out_list

    out_list_new = []
    for element in (out_list):
       out_list_new.append([element[0], element[1])

   intra_dist = [scipy.spatial.distance.cdist(clusters[i],out_list[i], 'euclidean') for i in xrange(num_clusters)]

应该有效