Question

我编写了用于做情感分析的代码，因此我使用了两个不同的词典，其中句子是标记为否定的正面。到目前为止，我的代码段看起来像这样：

def format_sentence(sentence):
     return {word: True for word in word_tokenize(satz) }

pos_data = []
with open('Positiv.txt') as f:
    for line in f:
        pos_data.append([format_sentence(line), 'pos'])

neg_data = []
with open('Negativ.txt') as f:
    for line in f:
       neg_data.append([format_sentence(line), 'neg'])

training_data = pos_data[:3] +  neg_data[:3]
test_data = pos_data[3:] + neg_data[3:]

model = NaiveBayesClassifier.train(training_data)

现在我希望代码能够从字典中的句子中删除所有Stopwords，但我不知道如何将其实现到我的代码中，因为我是Python编程的初学者。如果有人能帮助我，我将非常感激:)）

Answer 1

如果您只使用python列表，请尝试使用此代码模板，该模板会创建一个包含已删除停用词的新列表：

list_without_stopwords = [word for word in original_list if word not in stopword_list]

Answer 2

看起来你正在使用NLTK中的朴素贝叶斯分类器实现。 NLTK还内置了某些语言的禁用词汇表。

from nltk.corpus import stopwords
stops = stopwords.words('english')

def format_sentence(sentence):
    return {word: True for word in word_tokenize(sentence) if word not in stops}

如何在此代码中强化停用词？

2 个答案: