从头开始实现用户-用户协作过滤

时间:2019-12-12 12:19:34

标签: python numpy matrix collaborative-filtering recommender-systems

我正在尝试根据AnalyticsVidhya的tutorial User-User Collaborative Filtering上的公式在MovieLens-100k数据集上实现用户-用户协作过滤算法,我在使用NumPy进行矢量算法转换时遇到了麻烦。我已经尝试将近2天时间来了解矩阵尺寸。我所做的:

  1. 导入MovieLens数据集并创建用户项矩阵。有943位独特用户和1682部独特电影。

  2. 为清楚起见,我对矩阵进行切片以仅考虑user_item_matrix的前10个用户和5部电影

tst_rating = user_item_matrix[0:10,0:5]
  1. 我将评分定为0:
tot_ratings_per_user = np.sum(tst_rating,axis=1)
num_ratings_per_user = ((tst_rating != 0).sum(1))
avg_rating_per_user = np.divide(tot_ratings_per_user,num_ratings_per_user,out=np.zeros_like(tot_ratings_per_user),where=num_ratings_per_user != 0)
avg_rating_per_user = np.reshape(avg_rating_per_user,(avg_rating_per_user.shape[0],-1))
tst_rating = tst_rating - avg_rating_per_user
  1. 计算中心评分附近的余弦相似度:
user_sim = 1 - pairwise_distances(tst_rating, metric='cosine')
  1. 尝试按用户1预测所有商品的评分
def predict_rating(user,tst_rating,user_sim):
    print('--------------------Calculation of rating predictions for user {}--------------------'.format(user))
    u_id = user - 1
    user_idxs = np.arange(tst_rating.shape[0])
    user_idxs = np.delete(user_idxs,u_id,axis=0)
    num_other_users = user_idxs.shape[0]

    A = user_sim[u_id,user_idxs]
    A = np.reshape(A,(-1,A.shape[0]))
    print('User similarity {} between user {} and rest of the users:\n{}'.format(A.shape,user,A))
    input()
    B = tst_rating[user_idxs,:]
    print('Ratings {} for all items for all users except user {}\n{}'.format(B.shape,user,B))

    input()
    numer = np.dot(A,B)
    denom = A * num_other_users

    print('NUMERATOR {} = {} x \n{} = \n{}'.format(numer.shape,A,B,numer))
    print('DENOMINATOR {} = {} x \n{} = \n{}'.format(denom.shape,A,num_other_users,denom))
    input()
    user_ratings = np.divide(numer,denom,out=np.zeros_like(numer),where=denom != 0)
    print('NUMERATOR/DENOMINATOR = {}'.format(user_ratings))


predict_rating(1,tst_rating,user_sim)

问题在于,它在user_ratings np.divide()步骤中抱怨分子和分母的矩阵尺寸不匹配:

user_ratings = np.divide(numer,denom,out=np.zeros_like(numer),where=denom != 0)
ValueError: operands could not be broadcast together with shapes (1,5) (1,9) (1,5) (1,9)

因为分子的形状变为(1,5),分母的形状变为(1,9)。我真的不是我到底在做错什么,以便按照给定的公式以矢量化格式进行计算。我真的很感谢对此事的任何见解/帮助/指导!

0 个答案:

没有答案