这是我的sentimentMapper.py
:
#!/usr/bin/env python
import tweepy
import json
from tweepy import Stream
from tweepy import OAuthHandler
access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''
class StdOutListener(tweepy.StreamListener):
def on_data(self, data):
# Parsing
decoded = json.loads(data)
#open a file to store the status objects
file = open('stream.json', 'w')
#write json to file
json.dump(decoded,file,sort_keys = True,indent = 4)
#show progress
print ("Writing tweets to file,CTRL+C to terminate the program")
return True
def on_error(self, status):
print (status)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
twitterStream = Stream(auth, StdOutListener())
twitterStream.filter(track=["xx"])
上面的代码是为了以JSON的形式收集Twitter数据而编写的。
我使用python脚本根据这篇文章进行情绪分析:http://www.alex-hanna.com/tworkshops/lesson-6-basic-sentiment-analysis/
我不确定为什么我会收到额外的数据错误。 我使用的是Python 2.6和cloudera。
错误:
Expecting property name: line 1 column 1 (char 1) Extra data: line 1 column 14 - line 1 column 21 (char 14 - 21) Extra data: line 1 column 13 - line 1 column 20 (char 13 - 20) Extra data: line 1 column 12 - line 1 column 47 (char 12 - 47) Extra data: line 1 column 10 - line 1 column 13 (char 10 - 13)
Extra data: line 1 column 10 - line 1 column 15 (char 10 - 15) Extra data: line 1 column 9 - line 1 column 14 (char 9 - 14) Extra data: line 1 column 6 - line 1 column 9 (char 6 - 9) Expecting property name: line 1 column 1 (char 1) Extra data: line 1 column 13 - line 1 column 50 (char 13 - 50) Extra data: line 1 column 14 - line 1 column 70 (char 14 - 70) Extra data: line 1 column 9 - line 1 column 12 (char 9 - 12) Extra data: line 1 column 2 - line 1 column 3 (char 2 - 3)
Traceback (most recent call last): File "sentimentMapper.py", line 81, in <module> main() File "sentimentMapper.py", line 35, in main if 'text' in data: TypeError: argument of type 'int' is not iterable
数据的外观如下:
{
"contributors": null,
"coordinates": null,
"created_at": "Mon Dec 11 15:09:15 +0000 2017",
"entities": {
"hashtags": [],
"symbols": [],
"urls": [
{
"display_url": "",
"expanded_url": "",
"indices": [
37,
60
],
"url": ""
}
],