我刚刚完成CodeAcademyIBM Watson课程,他们在python 2中编程,当我在python 3中将文件结束时,我不断收到此错误。 CodeAcademy中的文件脚本和所有凭据都运行良好。这是因为我在Python 3中工作,还是因为代码中存在问题。
Traceback (most recent call last):
File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 58, in <module>
user_result = analyze(user_handle)
File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 22, in analyze
text += status.text.encode('utf-8')
TypeError: must be str, not bytes
有谁知道什么是错的,代码如下:
import sys
import operator
import requests
import json
import twitter
from watson_developer_cloud import PersonalityInsightsV2 as PersonalityInsights
def analyze(handle):
twitter_consumer_key = '<key>'
twitter_consumer_secret = '<secret>'
twitter_access_token = '<token>'
twitter_access_secret = '<secret>'
twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)
statuses = twitter_api.GetUserTimeline(screen_name = handle, count = 200, include_rts = False)
text = ""
for status in statuses:
if (status.lang =='en'): #English tweets
text += status.text.encode('utf-8')
#The IBM Bluemix credentials for Personality Insights!
pi_username = '<username>'
pi_password = '<password>'
personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(text)
return pi_result
def flatten(orig):
data = {}
for c in orig['tree']['children']:
if 'children' in c:
for c2 in c['children']:
if 'children' in c2:
for c3 in c2['children']:
if 'children' in c3:
for c4 in c3['children']:
if (c4['category'] == 'personality'):
data[c4['id']] = c4['percentage']
if 'children' not in c3:
if (c3['category'] == 'personality'):
data[c3['id']] = c3['percentage']
return data
def compare(dict1, dict2):
compared_data = {}
for keys in dict1:
if dict1[keys] != dict2[keys]:
compared_data[keys]=abs(dict1[keys] - dict2[keys])
return compared_data
user_handle = "@itsguppythegod"
celebrity_handle = "@giselleee_____"
user_result = analyze(user_handle)
celebrity_result = analyze(celebrity_handle)
user = flatten(user_result)
celebrity = flatten(celebrity_result)
compared_results = compare(user, celebrity)
sorted_result = sorted(compared_results.items(), key=operator.itemgetter(1))
for keys, value in sorted_result[:5]:
print(keys, end = " ")
print(user[keys], end = " ")
print ('->', end - " ")
print (celebrity[keys], end = " ")
print ('->', end = " ")
print (compared_results[keys])
答案 0 :(得分:0)
您在此处创建了str
(unicode文本)对象:
text = ""
然后继续附加UTF-8编码的字节:
text += status.text.encode('utf-8')
在Python 2中,""
创建了一个字节串,这一切都很好(尽管你将UTF-8字节发布到一个服务,将其全部解释为Latin-1,请参阅{{3} }。
要解决此问题,在收集完所有推文之前不会编码状态文本。另外,告诉Watson期望UTF-8数据。最后但并非最不重要的是,你应该首先构建一个twitter文本列表,然后在str.join()
之后用一个步骤将它们连接起来,因为循环中的连接字符串需要二次时间:
text = []
for status in statuses:
if (status.lang =='en'): #English tweets
text.append(status.text)
# ...
personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(
' '.join(text).encode('utf8'),
content_type='text/plain; charset=utf-8'
)