在稀疏矩阵上进行奇异值分解(svd)时应考虑哪些因素?
这是一个非常稀疏的矩阵。我已经使用0完成了归因。我还需要其他技术吗?该代码在下面列出。
import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import Normalizer
from sklearn.metrics.pairwise import cosine_similarity
r_cols = ['user_id', 'movie_id', 'rating','xcx']
data = pd.read_csv('ml-100k/ua.test', sep='\t', names=r_cols, usecols=['user_id', 'movie_id', 'rating'], encoding='latin-1')
dtm = data.pivot(index='movie_id', columns='user_id', values='rating').fillna(0)
np.savetxt("pivot.csv", dtm, delimiter=",")
#without matrix factoriztion
cosine_sim = cosine_similarity(dtm, dtm)
np.savetxt("foo13.csv", cosine_sim, delimiter=",")
#with matrix factoriztion
lsa = TruncatedSVD(200, algorithm = 'arpack')
dtm_lsa = lsa.fit_transform(dtm)
dtm_lsa = Normalizer(copy = False).fit_transform(dtm_lsa)
similarity = np.asarray(np.asmatrix(dtm_lsa)*np.asmatrix(dtm_lsa).T)
np.savetxt("foo12.csv", similarity, delimiter=",")
如果我有任何问题,请随时指出。