Question

我试图为芝加哥地区下载一些特别关注犯罪相关推文的推特数据。我还需要这些与坐标进行地理标记。我想为分析目的获得大量数据但是REST API是有限的，因此将其限制在相当低的数量。基于类似的问题Avoid twitter api limitation with Tweepy，我一直试图为此制定解决方案，但到目前为止，我没有太多运气。任何人都可以帮我这个吗？我是所有这类东西的新手，所以任何帮助都会非常感激。理想情况下，我也想在熊猫数据帧中使用它。我一直在使用以下教程作为编码的基础。这可以在以下位置找到： http://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./ 我已经复制了下面的代码：

import tweepy
auth = tweepy.AppAuthHandler('', '')
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (not api):
print ("Can't Authenticate")
sys.exit(-1)

import sys
import jsonpickle
import os



searchQuery = 'shooting OR stabbing OR violence OR assualt OR attack OR homicide OR punched OR mugging OR murder'
geocode= "41.8781,-87.6298,15km"


maxTweets = 1000000
tweetsPerQry = 100
fName = 'tweets.txt'
sinceId = None
max_id = 1L
tweetCount = 0
print ("Downloading max {0} tweets".format(maxTweets))
with open (fName, 'w') as f:
  while tweetCount < maxTweets:
    try:
        if (max_id <= 0):
            if(not sinceId):
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry)
            else:
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, since_id=sinceID)
        else:
            if (not sinceId):
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, max_id=str(max_id-1))
            else:
                new_tweets = api.search(q=searchQuery, geocode=geocode, count=tweetsPerQry, max_id=str(max_id-1), since_id=sinceId)
        if not new_tweets:
            print ("No more tweets found")
            break
        for tweet in new_tweets:
            f.write(jsonpickle.encode(tweet._json, unpicklable=False)+'\n')
        tweetCount += len(new_tweets)
        print("Downloaded {0} tweets".format(tweetCount))
        max_id = new_tweets[-1].id
    except tweepy.TweepError as e:
        print("some error : " + str(e))
        break
print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName))

Answer 1

遇到同样的问题后，我创建了一种识别即将发生的API速率限制的方法。这个python代码使用tweepy，它将打印所发出的API请求数和剩余的允许请求数。您可以在达到限制之前或之后添加自己的代码来延迟/休眠/等待，或使用tweepy wait_on_rate_limit （更多详细信息HERE）。

示例输出：

Twitter API：使用了3个请求，其余为177个，用于对/ search / tweets的API查询

Twitter API：对/ application / rate_limit_status的API查询使用了3个请求，其余为177个

ResultRow

另请注意， wait_on_rate_limit “将停止例外。无论如何需要很长时间才能使用Tweepy进行补充。” Aaron Hill 2014年7月，{{ 3}}是一个Stackoverflow页面，对此有更多的评论。

Tweepy api限制解决方法

1 个答案: