将停用词字典导入python

时间:2018-06-11 09:47:37

标签: python nltk stop-words

如何将特定的禁用词典(excel表)导入Python并将其另外运行到nltk禁用词列表?目前我的禁用词部分如下所示:

# filter out stop words
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
words = [w for w in words if not w in stop_words]

提前致谢!

1 个答案:

答案 0 :(得分:0)

您可以使用pandas库导入Excel工作表。此示例假定您的停用词位于第一列,每行一个单词。然后,创建nltk停用词和你自己的停用词的联合:

import pandas as pd
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
# check pandas docs for more info on usage of read_excel
custom_words = pd.read_excel('your_file.xlsx', header=None, names=['mywords'])
# union of two sets
stop_words = stop_words | set(custom_words['mywords'])
words = [w for w in words if not w in stop_words]