正如我从解释TF-IDF输出的某些文章中所知的那样,是稀疏矩阵,然后我们使用.toarray()将其转换为输入到神经网络的方法,但是我在这里遇到一些关于内存的错误,我不明白关于此问题,计算机中是否使用了过多的内存?以及如何解决这个问题。
代码是:
vectorizer = TfidfVectorizer().fit(train_text)
tfidf_vector = vectorizer.transform(train_text).toarray()
tfidf_vector = tfidf_vector[:,:,None]
print(tfidf_vector.shape)
X_train, X_test, Y_train, Y_test = train_test_split(tfidf_vector,
test_size=0.2, random_state=1)
和错误:
File "C:/Users/xiangli/PycharmProjects/preparing_moviedata/polarity.py", line 60, in <module>
tfidf_vector = vectorizer.transform(train_text).toarray()
File "C:\Users\xiangli\Miniconda3\envs\preparing_moviedata\lib\site-packages\scipy\sparse\compressed.py", line 947, in toarray
out = self._process_toarray_args(order, out)
File "C:\Users\xiangli\Miniconda3\envs\preparing_moviedata\lib\site-packages\scipy\sparse\base.py", line 1184, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError
我想通过Tf-idf矢量化输出用于输入神经网络。