使用sklearn' linear_kernel()计算点积

时间:2018-03-14 13:20:21

标签: python-3.x numpy scikit-learn recommendation-engine

在计算数组的点积时,我有一个数组错误太大。

数据样本是:

metadata['overview'].head()
out: 0    Led by Woody, Andy's toys live happily in his ...
1    When siblings Judy and Peter discover an encha...
2    A family wedding reignites the ancient feud be...
3    Cheated on, mistreated and stepped on, the wom...
4    Just when George Banks has recovered from his ...
Name: overview, dtype: object

使用TF-IDF Vectorizer,这将给出一个矩阵,其中每列代表概览词汇表中的一个单词,每列代表一部电影。

#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a'
tfidf = TfidfVectorizer(stop_words='english')
#Replace NaN with an empty string
metadata['overview'] = metadata['overview'].fillna('')
#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(metadata['overview'])
#Output the shape of tfidf_matrix
tfidf_matrix.shape
out[3]: (45466, 75827)

我正在使用Sklearn类

from sklearn.metrics.pairwise import linear_kernel
# Compute the cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

通过运行此行我遇到以下错误:

ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

任何人都可以指导我如何解决此错误?

0 个答案:

没有答案