我正在尝试根据AnalyticsVidhya的tutorial 上的公式在MovieLens-100k数据集上实现用户-用户协作过滤算法,我在使用NumPy进行矢量算法转换时遇到了麻烦。我已经尝试将近2天时间来了解矩阵尺寸。我所做的:
导入MovieLens数据集并创建用户项矩阵。有943位独特用户和1682部独特电影。
为清楚起见,我对矩阵进行切片以仅考虑user_item_matrix的前10个用户和5部电影
tst_rating = user_item_matrix[0:10,0:5]
tot_ratings_per_user = np.sum(tst_rating,axis=1)
num_ratings_per_user = ((tst_rating != 0).sum(1))
avg_rating_per_user = np.divide(tot_ratings_per_user,num_ratings_per_user,out=np.zeros_like(tot_ratings_per_user),where=num_ratings_per_user != 0)
avg_rating_per_user = np.reshape(avg_rating_per_user,(avg_rating_per_user.shape[0],-1))
tst_rating = tst_rating - avg_rating_per_user
user_sim = 1 - pairwise_distances(tst_rating, metric='cosine')
def predict_rating(user,tst_rating,user_sim):
print('--------------------Calculation of rating predictions for user {}--------------------'.format(user))
u_id = user - 1
user_idxs = np.arange(tst_rating.shape[0])
user_idxs = np.delete(user_idxs,u_id,axis=0)
num_other_users = user_idxs.shape[0]
A = user_sim[u_id,user_idxs]
A = np.reshape(A,(-1,A.shape[0]))
print('User similarity {} between user {} and rest of the users:\n{}'.format(A.shape,user,A))
input()
B = tst_rating[user_idxs,:]
print('Ratings {} for all items for all users except user {}\n{}'.format(B.shape,user,B))
input()
numer = np.dot(A,B)
denom = A * num_other_users
print('NUMERATOR {} = {} x \n{} = \n{}'.format(numer.shape,A,B,numer))
print('DENOMINATOR {} = {} x \n{} = \n{}'.format(denom.shape,A,num_other_users,denom))
input()
user_ratings = np.divide(numer,denom,out=np.zeros_like(numer),where=denom != 0)
print('NUMERATOR/DENOMINATOR = {}'.format(user_ratings))
predict_rating(1,tst_rating,user_sim)
问题在于,它在user_ratings np.divide()步骤中抱怨分子和分母的矩阵尺寸不匹配:
user_ratings = np.divide(numer,denom,out=np.zeros_like(numer),where=denom != 0)
ValueError: operands could not be broadcast together with shapes (1,5) (1,9) (1,5) (1,9)
因为分子的形状变为(1,5),分母的形状变为(1,9)。我真的不是我到底在做错什么,以便按照给定的公式以矢量化格式进行计算。我真的很感谢对此事的任何见解/帮助/指导!