我以稀疏矩阵格式运行KNN最近邻搜索。
nlf = neighbors.NearestNeighbors(n_neighbors=20,algorithm='brute', metric='cosine')
df_csr
Out[]: <100x2253274 sparse matrix of type '<type 'numpy.int64'>'
with 8105964 stored elements in Compressed Sparse Row format>
trainY = xrange(100)
nlf.fit(df_csr, trainY)
sf_csr
Out[]: <1x2253274 sparse matrix of type '<type 'numpy.int64'>'
with 7172 stored elements in Compressed Sparse Row format>
result1 = nlf.kneighbors(sf_csr)
result1[1]+1, result1[0]
Out[230]:
(array([[ 63, 10, 78, 19, 40, 14, 23, 53, 11, 66, 29, 77, 69,
83, 76, 25, 100, 22, 15, 21]], dtype=int64),
array([[ 0.98304724, 0.9903958 , 0.99536581, 0.99604388, 0.99706035,
0.99749375, 0.99768032, 0.99778807, 0.99779205, 0.99783219,
0.99822192, 0.9982969 , 0.99831123, 0.99840337, 0.99849419,
0.99858861, 0.99861923, 0.99863749, 0.99865913, 0.99875224]]))
余弦距离非常大> 0.983
事实上,我运行了20个奇数段,其中大部分都有大的余弦距离&gt; 0.983
结果好吗?我错过了什么? sklearn是否正确计算了余弦距离(和余弦相似度)?
请帮忙。