从一个多月前下载历史推文

时间:2015-12-22 19:27:14

标签: python date twitter tweepy

我试图在特定日期范围之间的几个月内下载推文。我只能下载一周但不能过去。

代码:

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
import time

ckey = 'key'
csecret = 'key'
atoken = 'key'
asecret = 'key'

def toDataFrame(tweets):

    DataSet = pd.DataFrame()

    DataSet['tweetID'] = [tweet.id for tweet in tweets]
    DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
    DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
    DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
    DataSet['tweetSource'] = [tweet.source for tweet in tweets]
    DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
    DataSet['userID'] = [tweet.user.id for tweet in tweets]
    DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
    DataSet['userName'] = [tweet.user.name for tweet in tweets]
    DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
    DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
    DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
    DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
    DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
    DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
    DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
    DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
    DataSet['Language'] = [tweet.user.lang for tweet in tweets]
    tweets_place= []
    #users_retweeted = []
    for tweet in tweets:
        if tweet.place:
            tweets_place.append(tweet.place.full_name)
        else:
            tweets_place.append('null')
    DataSet['TweetPlace'] = [i for i in tweets_place]
    #DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]

    return DataSet

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret,'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
#auth = tweepy.AppAuthHandler('key', 'key')

api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)
else:
# I am trying to download from Dec 1st to Dec 7th but I am not able to

    cursor = tweepy.Cursor(api.search, q='#chennairains OR #chennaihelp OR #chennaifloods',since= '2015-12-20',until='2015-12-21',lang='en',count=100)
    results=[]
    for item in cursor.items():
        results.append(item)

    DataSet = toDataFrame(results)
    DataSet.to_csv('output.csv',index=False)

该程序可以很好地从一周内下载数据,但无法从超过一周的时间内下载。我确实尝试在这里引用几个帖子,但大多数都没有得到答复。任何建议表示赞赏。

1 个答案:

答案 0 :(得分:3)

Twitter限制从其REST API返回的数据量,而Tweepy的API课程正在使用Twitter的REST API

来自https://dev.twitter.com/overview/general/things-every-developer-should-know

  

有分页限制   Rest API限制   客户端可以通过页面访问理论上最多3,200个状态,并计算user_timeline REST API方法的参数。其他时间轴方法的理论最大值为800。超过限制的请求将导致状态代码为200的回复以及请求格式的空结果。 Twitter仍然维护着用户发送的所有推文的数据库。但是,为了确保网站的性能,这个人为限制暂时到位。

如果您想要获得更长时间的回顾,Gnip和DataSift等付费服务可以提供此数据。