在sklearn NearestNeighbors搜索中,余弦距离非常大

时间:2016-04-12 22:44:44

标签: python cosine

我以稀疏矩阵格式运行KNN最近邻搜索。

nlf = neighbors.NearestNeighbors(n_neighbors=20,algorithm='brute', metric='cosine')

df_csr 

Out[]: <100x2253274 sparse matrix of type '<type 'numpy.int64'>'
with 8105964 stored elements in Compressed Sparse Row format>

trainY = xrange(100)
nlf.fit(df_csr, trainY)

sf_csr

Out[]: <1x2253274 sparse matrix of type '<type 'numpy.int64'>'
with 7172 stored elements in Compressed Sparse Row format>

result1 = nlf.kneighbors(sf_csr)

result1[1]+1, result1[0]

Out[230]:
(array([[ 63,  10,  78,  19,  40,  14,  23,  53,  11,  66,  29,  77,  69,
      83,  76,  25, 100,  22,  15,  21]], dtype=int64),
 array([[ 0.98304724,  0.9903958 ,  0.99536581,  0.99604388,  0.99706035,
      0.99749375,  0.99768032,  0.99778807,  0.99779205,  0.99783219,
      0.99822192,  0.9982969 ,  0.99831123,  0.99840337,  0.99849419,
      0.99858861,  0.99861923,  0.99863749,  0.99865913,  0.99875224]]))

余弦距离非常大> 0.983

事实上,我运行了20个奇数段,其中大部分都有大的余弦距离&gt; 0.983

结果好吗?我错过了什么? sklearn是否正确计算了余弦距离(和余弦相似度)?

请帮忙。

0 个答案:

没有答案