stop_words for new version 2014.5.26

时间:2016-11-22 08:30:51

标签: python

我正在为新版本stop-words2014.5.26更新我的停用词,因为我想使用阿拉伯语停用词。我在Anaconda工作。下载并安装了停用词后,出现以下错误:

from stop_words import get_stop_words
stop=set(get_stop_words('english'))
<ipython-input-15-47cdc7fed487> in <module>()

这一行

stop=set(get_stop_words('english'))

抛出此错误:

C:\Anaconda3\lib\site-packages\stop_words-2014.5.26-py3.5.egg\stop_words\__init__.py
in get_stop_words(language)
     21     with open('{0}{1}.txt'.format(STOP_WORDS_DIR, language)) as lang_file:
     22         lines = lang_file.readlines()
---> 23         return [str(line.strip()).decode('utf-8') for line in lines]

C:\Anaconda3\lib\site-packages\stop_words-2014.5.26-py3.5.egg\stop_words\__init__.py
in <listcomp>(.0)
     21     with open('{0}{1}.txt'.format(STOP_WORDS_DIR, language)) as lang_file:
     22         lines = lang_file.readlines()
---> 23         return [str(line.strip()).decode('utf-8') for line in lines]

AttributeError: 'str' object has no attribute 'decode'

1 个答案:

答案 0 :(得分:0)

我将停用词版本更新为2015年新版本,然后每件事都正常工作

from nltk.corpus import stopwords
stop = set(stopwords.words('arabic'))
stop
{'،',

'أ',  'ا',  'اثر',  'اجل',  'احد',  'اخرى',