Sklearn最近邻居和看不见的数据

时间:2017-12-18 17:28:44

标签: python nearest-neighbor collaborative-filtering

我使用Nearest Neighbors寻找密切相关的客户,以便向目标客户推荐热门产品。我已经使用稀疏的训练用户矩阵来获得余弦距离。但是,我无法在拟合模型上获得新用户的索引和距离,因为这些用户不在原始矩阵中。有没有办法解决这个问题,还是每次引入新用户时都必须重新安装模型?

谢谢!

from scipy.sparse import csr_matrix
train_df = train.pivot(index = 'user', columns = 'product_id', values = 'rating').fillna(0)
test_df = test.pivot(index = 'user', columns = 'product_id', values = 'rating').fillna(0)
train_mat = csr_matrix(train_df.values)
test_mat = csr_matrix(test_df.values)

from sklearn.neighbors import NearestNeighbors

model_knn = NearestNeighbors(metric = 'cosine', algorithm = 'brute', n_neighbors=30)
model_knn.fit(train_mat)

test_user = list(np.sort(test_df.user.unique())) 

list1=[]
query_index = np.random.choice(test_user)
distances, indices = model_knn.kneighbors(test_df.loc[query_index, :].values.reshape(1, -1))
for i in range(0, len(distances.flatten())):
    list1.append(test_df.index[indices.flatten()[i]])

以下是错误消息:

ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 1605 while Y.shape[1] == 2724

1 个答案:

答案 0 :(得分:0)

documentation中指出:

class AAA {
   constructor(value) {
     this.value = value;
   }

   add = a => value + a;
}

因此您可以尝试:

Returns:
dist : array
    Array representing the lengths to points, only present if return_distance=True