我正在为新版本stop-words2014.5.26
更新我的停用词,因为我想使用阿拉伯语停用词。我在Anaconda工作。下载并安装了停用词后,出现以下错误:
from stop_words import get_stop_words
stop=set(get_stop_words('english'))
<ipython-input-15-47cdc7fed487> in <module>()
这一行
stop=set(get_stop_words('english'))
抛出此错误:
C:\Anaconda3\lib\site-packages\stop_words-2014.5.26-py3.5.egg\stop_words\__init__.py
in get_stop_words(language)
21 with open('{0}{1}.txt'.format(STOP_WORDS_DIR, language)) as lang_file:
22 lines = lang_file.readlines()
---> 23 return [str(line.strip()).decode('utf-8') for line in lines]
C:\Anaconda3\lib\site-packages\stop_words-2014.5.26-py3.5.egg\stop_words\__init__.py
in <listcomp>(.0)
21 with open('{0}{1}.txt'.format(STOP_WORDS_DIR, language)) as lang_file:
22 lines = lang_file.readlines()
---> 23 return [str(line.strip()).decode('utf-8') for line in lines]
AttributeError: 'str' object has no attribute 'decode'
答案 0 :(得分:0)
我将停用词版本更新为2015年新版本,然后每件事都正常工作
from nltk.corpus import stopwords
stop = set(stopwords.words('arabic'))
stop
{'،',
'أ', 'ا', 'اثر', 'اجل', 'احد', 'اخرى',