迭代字典以产生Twitter情绪

时间:2017-05-24 15:20:29

标签: python twitter data-science

我正在进行在线课程,我正在尝试分析推文。我想遍历字典,一个是推文,一个是文字和相应的情感(爱= 3,悲伤= -3等)。在一些帮助下,我编写了以下代码,但它给了我错误:

Traceback (most recent call last):   File
"/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 22, in <module>
    print traverse_tweets(tweet_file, scores)   File "/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 17, in traverse_tweets
    return [cmp_tweet_sentiment(tweet, scores) for tweet in tweets]   File
"/Users/fabiangeiger/Code/datasci_course_materials/assignment1/test.py",
line 13, in cmp_tweet_sentiment
    return sum(scores.get(word, 0)) TypeError: 'int' object is not iterable

推文字典包含以下格式的推文。该示例显示了单个推文的开头:{"created_at":"Wed May 24 15:51:00 +00002017","id":867407593760796672,"id_str":"867407593760796672",‌​"text":"ai, nada estraga meu dia hoje .... etc}。情绪文件看起来像这样,由一个tabstop分隔:abandon -2 abandoned -2​​ abandons -2最后,每条推文都应该有一个情绪分数,通过为senimtent文件中包含的特定单词分配一个分数来弥补。

以下是代码:

import sys
import json

def read_sentiment(sent_file): # parse the sentiment file and return  a {word: sentiment} dictionary
  scores = {} # initialize an empty dictionary
  for line in sent_file:
    term, score  = line.split("\t")  # The file is tab-delimited. "\t" means "tab character"
    scores[term] = int(score)  # Convert the score to an integer.
  return scores # Print the dictionary itself

def cmp_tweet_sentiment(tweet, scores):
  for word in tweet.split():
    return sum(scores.get(word, 0))

def traverse_tweets(tweet_file, scores): #calculate scores for all tweets
  tweets = (json.loads(line).get("text", '') for line in tweet_file)
  return [cmp_tweet_sentiment(tweet, scores) for tweet in tweets]

sent_file = open("AFINN-111.txt")
tweet_file = open("problem_1_submission.txt")
scores = read_sentiment(sent_file)
print traverse_tweets(tweet_file, scores)

0 个答案:

没有答案