我检查了spaCy停用词列表,单词“ not”显示为true。 但是我仍然添加了我的代码,以使“ not”成为停用词:
my_stopwords = ["not","be"," "]
for word in my_stopwords:
nlp.vocab[word].is_stop = True
我再次检查了“ not”是否为停用词:
nlp.vocab["not"].is_stop
True
nlp.vocab["not"].is_stop
True
然后执行以下命令:
texts, article, skl_texts = [], [], []
for w in doc:
if w.is_stop != True and w.text != '\n' and not w.is_punct and not
w.like_num and w.lemma_ != '-PRON-' :
article.append(w.lemma_)
if w.text == '\n':
skl_texts.append(' '.join(article))
texts.append(article)
article = []
我得到的列表仍然包括单词“ not”和“”:
'ioc',
' ',
'get',
'promotion',
'year',
' ',
'this',
'family',
'yes',
'for',
'business',
'vacation',
'not',
'specific',
'depend',
'offer',
'how',
如何删除单词“ not”以及空格-''
请帮助!