Question

我直接将dataframe列传递给stopword时出错。我该如何解决这个问题

    stop_words_corpus=pd.DataFrame(word_dictionary_corpus.Word.unique(),columns=feature_names)

cv = CountVectorizer( max_features = 200,analyzer='word',stop_words= stop_words_corpus) 
cv_txt = cv.fit_transform(data.pop('Clean_addr'))

****Updated Error***

fit_transform中的

〜\ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ sklearn \ feature_extraction \ text.py（self，raw_documents，y） 867 868词汇，X = self._count_vocab（raw_documents， - ＆GT; 869 self.fixed_vocabulary_） 870 871如果是self.binary：

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
    783             vocabulary.default_factory = vocabulary.__len__
    784 
--> 785         analyze = self.build_analyzer()
    786         j_indices = []
    787         indptr = _make_int_array()

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in build_analyzer(self)
    260 
    261         elif self.analyzer == 'word':
--> 262             stop_words = self.get_stop_words()
    263             tokenize = self.build_tokenizer()
    264 


I fixed the error taht error still having the issue

Answer 1

试试这个：

cv = CountVectorizer(max_features = 200,
                     analyzer='word',
                     stop_words=stop_words_corpus.stack().unique())

Answer 2

我们需要将数据框设置为NpArray以将停用词传递到计数器

stop_word =stop_words_corpus['Word'].values

cv = CountVectorizer(max_features = 200,
                     analyzer='word',
                     stop_words=stop_word)

如何从数据帧列传递停用词

2 个答案: