我已完成所有预处理任务,例如删除停用词,HTML标记等。 我试图使用Multinomial Naive Bayes对IMDB电影数据集(Stanford Unoversity的大型电影评论数据集)进行分类。我在varibable X上遇到错误。我已经制作成2D阵列但不知道如何处理错误?
这是Multinomial Naive Bayes代码的一部分。
categories = ['pos','neg']
doc_to_train = sklearn.datasets.load_files("/home/satyam/aclImdb_v1/aclImdb/train", description = None, categories = categories ,load_content=True,enco ding='utf-8',shuffle=True,random_state=42)
vectorizer = CountVectorizer()
X = (vectorizer.fit_transform(tokens).toarray())
analyze = vectorizer.build_analyzer()
vect = vectorizer.get_feature_names()
y = np.array(doc_to_train.target)
X = X.reshape()
X = X.transpose()
print (X)
X_train, X_test, y_train,y_test= train_test_split(X,y, test_size=0.3)
mnb=MultinomialNB().fit(X_train,y_train).predict(X_test)
print ("MNB " %mnb)
print ("Prediction " %mnb.predict(X_test))
accuracy = mnb.score(X_test, y_test)
print ("Accuracy " %accuracy)
遇到的错误是
Traceback (most recent call last):
File "sentiment_analysis_NB.py", line 92, in <module>
X = (vectorizer.fit_transform(tokens).toarray())
File "/usr/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 943, in toarray
out = self._process_toarray_args(order, out)
File "/usr/lib/python3.6/site-packages/scipy/sparse/base.py", line 1130, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError