我正在收集来自Twitter API的回复的推文以构建数据集,为此我在python中使用了tweepy库,但是问题是我收到了这么多错误(达到了速率限制。数秒)),这使我延迟,我必须在最短的时间内收集尽可能多的数据
我读到twitter的速率限制为我认为每15分钟有15个请求或类似的内容,但就我的情况而言,我只能收集一条或两条tweet,直到再次停止,有时它会停止15分钟,然后再停15分钟,不给我时间,我不知道是什么原因造成的,这是否是我的代码?
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the tweepy library
import tweepy
import sys
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = '-'
ACCESS_SECRET = '-'
CONSUMER_KEY = '-'
CONSUMER_SECRET = '-'
# Setup tweepy to authenticate with Twitter credentials:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
# Create the api to connect to twitter with your creadentials
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)
file2 = open('replies.csv','w', encoding='utf-8-sig')
replies=[]
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
for full_tweets in tweepy.Cursor(api.search,q='#عربي',timeout=999999,tweet_mode='extended').items():
if (not full_tweets.retweeted) and ('RT @' not in full_tweets.full_text):
for tweet in tweepy.Cursor(api.search,q='to:'+full_tweets.user.screen_name,result_type='recent',timeout=999999,tweet_mode='extended').items(1000):
if hasattr(tweet, 'in_reply_to_status_id_str'):
if (tweet.in_reply_to_status_id_str==full_tweets.id_str):
replies.append(tweet.full_text)
print(full_tweets._json)
file2.write("{ 'id' : "+ full_tweets.id_str + "," +"'Replies' : ")
for elements in replies:
file2.write(elements.strip('\n')+" , ")
file2.write("}\n")
replies.clear()
file2.close()
$ python code.py > file.csv
Rate limit reached. Sleeping for: 262
Rate limit reached. Sleeping for: 853
答案 0 :(得分:0)
只需将此行添加到Python脚本中即可避免睡眠:
sleep_on_rate_limit=False