我正在尝试抓取#nationaldoughnutday主题标签的所有推文,但由于速率限制而未能这样做。
参考下面的代码,我试图将代码放入while循环中,以便在重置速率限制时,我可以从上次抓取的日期(直到日期)恢复抓取
但是,我一直反复出现此错误,并且我的搜寻器长时间睡眠后似乎并没有重新开始搜寻。
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
Sleeping...
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
Sleeping...
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
我试图删除内部的try catch循环,但也没有帮助
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
query = '#nationaldoughnutday'
untill_date = '01-07-2019'
while True:
try: #outer try catch
tweets = tweepy.Cursor(api.search, q=query + '-filter:retweets', rpp=100, lang='en',tweet_mode='extended',until = until_date).items()
for tweet in tweets:
try: #inner try catch
print "tweet : ", tweet.created_at
#this is so that if i reconnect with cursor, i will start with the date before the last crawled tweet
until_date = tweet.created_at.date() - datetime.timedelta(days=1)
except tweepy.TweepError as e:
print 'Inner TweepyError', e
time.sleep(17 * 60)
break
except tweepy.TweepError as e:
print 'Inner TweepyError',
print "sleeping ...."
time.sleep(17 * 60)
continue
except StopIteration:
break
提前谢谢!
答案 0 :(得分:0)
尝试添加此wait_on_rate_limit=True
并不能解决问题,因为有关twitter API删除此速率限制,但仍有助于停止显示错误