sentiment_analyser错误:' bytes'对象没有属性'编码'使用

时间:2017-09-15 04:10:49

标签: python python-3.x nltk

使用nltk处理项目情绪分析股票。我通过GH搜索过,发现没有类似的sentimaent_analyser或popular_scores调用。

我还查看了Python 3.4 - 'bytes' object has no attribute 'encode'并且它并不重复,因为我没有调用bcrypt.gensalt()。encode(' utf-8')。虽然它确实暗示了某种错误类型的问题。

任何人都可以帮忙解决此错误吗?

我收到错误:

   init 中的

/lib/python3.5/site-packages/nltk/sentiment/vader.py(self,text)      154 def init (self,text):      155如果不是isinstance(text,str):    - > 156 text = str(text.encode(' utf-8'))      157 self.text =文字      158 self.words_and_emoticons = self._words_and_emoticons()

     

AttributeError:' bytes'对象没有属性'编码'

数据帧df_stocks.head(5)是:

            prices  articles
2007-01-01  12469   What Sticks from '06. Somalia Orders Islamist...
2007-01-02  12472   Heart Health: Vitamin Does Not Prevent Death ...
2007-01-03  12474   Google Answer to Filling Jobs Is an Algorithm...
2007-01-04  12480   Helping Make the Shift From Combat to Commerc...
2007-01-05  12398   Rise in Ethanol Raises Concerns About Corn as...                

代码如下,最后一行出现错误:

import numpy as np
import pandas as pd
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *from nltk.sentiment.vader import     SentimentIntensityAnalyzer
import unicodedata
for date, row in df_stocks.T.iteritems():
    sentence = unicodedata.normalize('NFKD', df_stocks.loc[date, 'articles']).encode('ascii','ignore')
    ss = sid.polarity_scores(sentence)

谢谢

1 个答案:

答案 0 :(得分:1)

unicodedata.normalize() docs开始,该方法将UNICODE字符串转换为通用格式字符串。

import unicodedata

print(unicodedata.normalize('NFKD', u'abcdあäasc').encode('ascii', 'ignore'))

它会得到:

b'abcdaasc'

所以,问题在于:df_stocks.loc[date, 'articles']不是UNICODE字符串。