基于用户的协同过滤与稀疏矩阵Python

时间:2017-05-19 14:19:57

标签: python sparse-matrix collaborative-filtering

我正在为一个拥有100万(一个月)独特用户和13000个项目的门户网站实施推荐系统,我很想在大数据方面做得很好。

#---使用Sparse启动基于项目的建议---#

data = pd.read_csv('.../.csv').astype(float)

def cosine_similarities(mat):
    col_normed_mat = pp.normalize(mat.tocsc(), axis=0)
    return col_normed_mat.T * col_normed_mat

data_germany = data.drop('user', 1)
data = csc_matrix(data)
data_germany = csr_matrix(data_germany)
csc = cosine_similarities(data_germany)
csc = csc.tocoo(copy=False)

csc.data
Out[74]:
array([ 0.02988072,  0.01698824,  0.0174342 , ...,  0.03207501,
        0.09016696,  0.06804138])

在我的稀疏矩阵中有余弦距离我可以使用我的所有项目并通过行/列数据来建议它。这很简单。

问题是如何使用稀疏矩阵来实现和实现基于用户的CF. SciPy Matrices在现有方法的多样性方面非常糟糕。这些方法不允许我完全编写用于使用稀疏矩阵进行基于用户的CF的代码。我想用我的CF获得相同的效果但是使用稀疏矩阵。

#---启动基于用户的建议** ** 稀疏---#**

# Helper function to get similarity scores
def getScore(history, similarities):
   return sum(history*similarities)/sum(similarities)

# Create a place holder matrix for similarities, and fill in the user name column
data_sims = pd.DataFrame(index=data.index,columns=data.columns)
data_sims.ix[:,:1] = data.ix[:,:1]

#Loop through all rows, skip the user column, and fill with similarity scores
for i in range(0,len(data_sims.index)):
    for j in range(1,len(data_sims.columns)):
        user = data_sims.index[i]
        product = data_sims.columns[j]

        if data.ix[i][j] == 1:
            data_sims.ix[i][j] = 0
        else:
            product_top_names = data_neighbours.ix[product][1:10]
            product_top_sims = data_ibs.ix[product].order(ascending=False)[1:10]
            user_purchases = data_germany.ix[user,product_top_names]

            data_sims.ix[i][j] = getScore(user_purchases,product_top_sims)

# Get the top songs
data_recommend = pd.DataFrame(index=data_sims.index, columns=['user','1','2','3','4','5','6'])
data_recommend.ix[0:,0] = data_sims.ix[:,0]

# Instead of top song scores, we want to see names
for i in range(0,len(data_sims.index)):
    data_recommend.ix[i,1:] = data_sims.ix[i,:].order(ascending=False).ix[1:7,].index.transpose()

# Print a sample
print data_recommend.ix[:10,:4]

0 个答案:

没有答案