我正在进行在线课程,我正在尝试分析推文。我想遍历字典,一个是推文,一个是文字和相应的情感(爱= 3,悲伤= -3等)。在一些帮助下,我编写了以下代码,但它给了我错误:
Traceback (most recent call last): File
"/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 22, in <module>
print traverse_tweets(tweet_file, scores) File "/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 17, in traverse_tweets
return [cmp_tweet_sentiment(tweet, scores) for tweet in tweets] File
"/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 13, in cmp_tweet_sentiment
return sum(scores.get(word, 0)) TypeError: 'int' object is not iterable
推文字典包含以下格式的推文。该示例显示了单个推文的开头:{"created_at":"Wed May 24 15:51:00 +00002017","id":867407593760796672,"id_str":"867407593760796672","text":"ai, nada estraga meu dia hoje .... etc}
。情绪文件看起来像这样,由一个tabstop分隔:abandon -2 abandoned -2 abandons -2最后,每条推文都应该有一个情绪分数,通过为senimtent文件中包含的特定单词分配一个分数来弥补。
以下是代码:
import sys
import json
def read_sentiment(sent_file): # parse the sentiment file and return a {word: sentiment} dictionary
scores = {} # initialize an empty dictionary
for line in sent_file:
term, score = line.split("\t") # The file is tab-delimited. "\t" means "tab character"
scores[term] = int(score) # Convert the score to an integer.
return scores # Print the dictionary itself
def cmp_tweet_sentiment(tweet, scores):
for word in tweet.split():
return sum(scores.get(word, 0))
def traverse_tweets(tweet_file, scores): #calculate scores for all tweets
tweets = (json.loads(line).get("text", '') for line in tweet_file)
return [cmp_tweet_sentiment(tweet, scores) for tweet in tweets]
sent_file = open("AFINN-111.txt")
tweet_file = open("problem_1_submission.txt")
scores = read_sentiment(sent_file)
print traverse_tweets(tweet_file, scores)