Question

我正在进行在线课程，我正在尝试分析推文。我想遍历字典，一个是推文，一个是文字和相应的情感（爱= 3，悲伤= -3等）。在一些帮助下，我编写了以下代码，但它给了我错误：

Traceback (most recent call last):   File
"/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 22, in <module>
    print traverse_tweets(tweet_file, scores)   File "/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 17, in traverse_tweets
    return [cmp_tweet_sentiment(tweet, scores) for tweet in tweets]   File
"/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 13, in cmp_tweet_sentiment
    return sum(scores.get(word, 0)) TypeError: 'int' object is not iterable

推文字典包含以下格式的推文。该示例显示了单个推文的开头：{"created_at":"Wed May 24 15:51:00 +00002017","id":867407593760796672,"id_str":"867407593760796672",‌"text":"ai, nada estraga meu dia hoje .... etc}。情绪文件看起来像这样，由一个tabstop分隔：abandon -2 abandoned -2 abandons -2最后，每条推文都应该有一个情绪分数，通过为senimtent文件中包含的特定单词分配一个分数来弥补。

以下是代码：

import sys
import json

def read_sentiment(sent_file): # parse the sentiment file and return  a {word: sentiment} dictionary
  scores = {} # initialize an empty dictionary
  for line in sent_file:
    term, score  = line.split("\t")  # The file is tab-delimited. "\t" means "tab character"
    scores[term] = int(score)  # Convert the score to an integer.
  return scores # Print the dictionary itself

def cmp_tweet_sentiment(tweet, scores):
  for word in tweet.split():
    return sum(scores.get(word, 0))

def traverse_tweets(tweet_file, scores): #calculate scores for all tweets
  tweets = (json.loads(line).get("text", '') for line in tweet_file)
  return [cmp_tweet_sentiment(tweet, scores) for tweet in tweets]

sent_file = open("AFINN-111.txt")
tweet_file = open("problem_1_submission.txt")
scores = read_sentiment(sent_file)
print traverse_tweets(tweet_file, scores)

迭代字典以产生Twitter情绪

0 个答案: