我直接将dataframe列传递给stopword时出错。 我该如何解决这个问题
stop_words_corpus=pd.DataFrame(word_dictionary_corpus.Word.unique(),columns=feature_names)
cv = CountVectorizer( max_features = 200,analyzer='word',stop_words= stop_words_corpus)
cv_txt = cv.fit_transform(data.pop('Clean_addr'))
****Updated Error***
fit_transform中的〜\ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ sklearn \ feature_extraction \ text.py(self,raw_documents,y) 867 868词汇,X = self._count_vocab(raw_documents, - > 869 self.fixed_vocabulary_) 870 871如果是self.binary:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
783 vocabulary.default_factory = vocabulary.__len__
784
--> 785 analyze = self.build_analyzer()
786 j_indices = []
787 indptr = _make_int_array()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in build_analyzer(self)
260
261 elif self.analyzer == 'word':
--> 262 stop_words = self.get_stop_words()
263 tokenize = self.build_tokenizer()
264
I fixed the error taht error still having the issue
答案 0 :(得分:1)
试试这个:
cv = CountVectorizer(max_features = 200,
analyzer='word',
stop_words=stop_words_corpus.stack().unique())
答案 1 :(得分:0)
我们需要将数据框设置为NpArray
以将停用词传递到计数器
stop_word =stop_words_corpus['Word'].values
cv = CountVectorizer(max_features = 200,
analyzer='word',
stop_words=stop_word)