Question

使用nltk处理项目情绪分析股票。我通过GH搜索过，发现没有类似的sentimaent_analyser或popular_scores调用。

我还查看了Python 3.4 - 'bytes' object has no attribute 'encode'并且它并不重复，因为我没有调用bcrypt.gensalt（）。encode（＆＃39; utf-8＆＃39;）。虽然它确实暗示了某种错误类型的问题。

任何人都可以帮忙解决此错误吗？

我收到错误：

   init 中的
/lib/python3.5/site-packages/nltk/sentiment/vader.py(self，text）      154 def init （self，text）：      155如果不是isinstance（text，str）：    - ＆GT; 156 text = str（text.encode（＆＃39; utf-8＆＃39;））      157 self.text =文字      158 self.words_and_emoticons = self._words_and_emoticons（）

AttributeError：＆＃39; bytes＆＃39;对象没有属性＆＃39;编码＆＃39;

数据帧df_stocks.head（5）是：

            prices  articles
2007-01-01  12469   What Sticks from '06. Somalia Orders Islamist...
2007-01-02  12472   Heart Health: Vitamin Does Not Prevent Death ...
2007-01-03  12474   Google Answer to Filling Jobs Is an Algorithm...
2007-01-04  12480   Helping Make the Shift From Combat to Commerc...
2007-01-05  12398   Rise in Ethanol Raises Concerns About Corn as...

代码如下，最后一行出现错误：

import numpy as np
import pandas as pd
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *from nltk.sentiment.vader import     SentimentIntensityAnalyzer
import unicodedata
for date, row in df_stocks.T.iteritems():
    sentence = unicodedata.normalize('NFKD', df_stocks.loc[date, 'articles']).encode('ascii','ignore')
    ss = sid.polarity_scores(sentence)

谢谢

Answer 1

从 unicodedata.normalize() docs开始，该方法将UNICODE字符串转换为通用格式字符串。

import unicodedata

print(unicodedata.normalize('NFKD', u'abcdあäasc').encode('ascii', 'ignore'))

它会得到：

b'abcdaasc'

所以，问题在于：df_stocks.loc[date, 'articles']不是UNICODE字符串。

sentiment_analyser错误：＆＃39; bytes＆＃39;对象没有属性＆＃39;编码＆＃39;使用

1 个答案: