额外数据:以json格式分析推文时出错

时间:2017-12-11 14:21:14

标签: python json cloudera tweepy

这是我的sentimentMapper.py

#!/usr/bin/env python
import tweepy
import json
from tweepy import Stream
from tweepy import OAuthHandler

access_token = ''
access_token_secret = '' 
consumer_key = ''
consumer_secret = ''


 class StdOutListener(tweepy.StreamListener):

    def on_data(self, data):

        # Parsing 
        decoded = json.loads(data)
        #open a file to store the status objects
        file = open('stream.json', 'w')  
        #write json to file
        json.dump(decoded,file,sort_keys = True,indent = 4)
        #show progress
        print ("Writing tweets to file,CTRL+C to terminate the program")
        return True

    def on_error(self, status):
        print (status)

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
twitterStream = Stream(auth, StdOutListener())
twitterStream.filter(track=["xx"])

上面的代码是为了以JSON的形式收集Twitter数据而编写的。

我使用python脚本根据这篇文章进行情绪分析:http://www.alex-hanna.com/tworkshops/lesson-6-basic-sentiment-analysis/

我不确定为什么我会收到额外的数据错误。 我使用的是Python 2.6和cloudera。

错误:

Expecting property name: line 1 column 1 (char 1) Extra data: line 1 column 14 - line 1 column 21 (char 14 - 21) Extra data: line 1 column 13 - line 1 column 20 (char 13 - 20) Extra data: line 1 column 12 - line 1 column 47 (char 12 - 47) Extra data: line 1 column 10 - line 1 column 13 (char 10 - 13)

Extra data: line 1 column 10 - line 1 column 15 (char 10 - 15) Extra data: line 1 column 9 - line 1 column 14 (char 9 - 14) Extra data: line 1 column 6 - line 1 column 9 (char 6 - 9) Expecting property name: line 1 column 1 (char 1) Extra data: line 1 column 13 - line 1 column 50 (char 13 - 50) Extra data: line 1 column 14 - line 1 column 70 (char 14 - 70) Extra data: line 1 column 9 - line 1 column 12 (char 9 - 12) Extra data: line 1 column 2 - line 1 column 3 (char 2 - 3) 

Traceback (most recent call last): File "sentimentMapper.py", line 81, in <module> main() File "sentimentMapper.py", line 35, in main if 'text' in data: TypeError: argument of type 'int' is not iterable

数据的外观如下:

{
    "contributors": null,
    "coordinates": null,
    "created_at": "Mon Dec 11 15:09:15 +0000 2017",
    "entities": {
        "hashtags": [],
        "symbols": [],
        "urls": [
            {
                "display_url": "",
                "expanded_url":  "",
                "indices": [
                    37,
                    60
                ],
                "url": ""
            }
        ],

0 个答案:

没有答案