如何解码文本文件

时间:2016-09-03 02:06:10

标签: python twitter encoding tweepy

我在这里有这个代码,它完美无缺。

# encoding=utf8
#Import the necessary methods from tweepy library
import sys
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener

reload(sys)  
sys.setdefaultencoding('utf8')

#Variables that contains the user credentials to access Twitter API 
access_token = ""
access_token_secret = ""
consumer_key = ""
consumer_secret = ""

#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):

    def on_data(self, data):
        #save data
        with open('debate_data.txt', 'a') as tf:
            tf.write((data).decode('unicode-escape').encode('utf-8'))

        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':

    #This handles Twitter authetification and the connection to Twitter     Streaming API
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, l)

    #This line filter Twitter Streams to capture data by the keywords:     'Bernier', 'Rossello', 'Bernabe'
    stream.filter(track=['Bernier', 'Rosselló', 'Rossello', 'Bernabe', 'Lúgaro', 'Lugaro', 'María de Lourdes', 'Maria de Lourdes', 'Cidre'])

但是,当我运行另一段代码时,我得到了错误的答案。

import json
import io

#save the tweets to this path
tweets_data_path = 'debate_data.txt'

tweets_data = []
with io.open(tweets_data_path, 'r') as tweets_file:
    for line in tweets_file:
        try:
            tweet = json.loads(line)
            tweets_data.append(tweet)
        except:
            continue

print len(tweets_data)

该文件有42,188个推文,但是当我运行代码时我只得到291.我认为是编码/解码的东西,但我无法弄清楚是什么。任何帮助都会非常感激。

我在没有任何编码/解码的情况下运行此示例,并且它运行良好。

http://adilmoujahid.com/posts/2014/07/twitter-analytics/

2 个答案:

答案 0 :(得分:2)

仅获得291的原因是json.loads()抛出一些错误而except继续发生错误。

我建议您打印错误,如:

except Exception as err:
    print err
    continue

现在您知道错误原因,并解决它。

您确定debate_data.txt内的数据格式是json吗?

答案 1 :(得分:2)

正如agnewee所说,我也建议:

return $cityid[0]->id;