我对Twitter API和Tweepy很新,我对速率限制概念感到困惑,我正在使用流API,我想收集样本推文,而不使用任何过滤器,如hashtags或location,有些消息来源我说不应该使用示例推文限制速率,因为我获得了1%的推文,而另一些则另有说明。我经常收到错误420,我想知道是否有办法避免它或使其更顺畅? 非常感谢你的帮助
我的代码:
import json
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from textblob import TextBlob
from elasticsearch import Elasticsearch
from datetime import datetime
# import twitter keys and tokens
from config import *
# create instance of elasticsearch
es = Elasticsearch()
indexName = "test_new_fields"
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
class TweetStreamListener(StreamListener):
hashtags = []
# on success
def on_data(self, data):
# decode json
dict_data = json.loads(data) # data is a json string
# print(data) # to print the twitter json string
print(dict_data)
# pass tweet into TextBlob
tweet = TextBlob(dict_data["text"])
# determine if sentiment is positive, negative, or neutral
if tweet.sentiment.polarity < 0:
sentiment = "negative"
elif tweet.sentiment.polarity == 0:
sentiment = "neutral"
else:
sentiment = "positive"
# output polarity sentiment and tweet text
print (str(tweet.sentiment.polarity) + " " + sentiment + " " + dict_data["text"])
try:
#check if there r any hashtags
if len(dict_data["entities"]["hashtags"]) != 0:
hashtags = dict_data["entities"]["hashtags"]
#if no hashtags add empty
else:
hashtags= []
except:
pass
es.indices.put_settings(index=indexName, body={"index.blocks.write":False})
# add text and sentiment info to elasticsearch
es.index(index=indexName,
doc_type="test-type",
body={"author": dict_data["user"]["screen_name"],
"date": dict_data["created_at"], # unfortunately this gets stored as a string
"location": dict_data["user"]["location"], # user location
"followers": dict_data["user"]["followers_count"],
"friends": dict_data["user"]["friends_count"],
"time_zone": dict_data["user"]["time_zone"],
"lang": dict_data["user"]["lang"],
#"timestamp": float(dict_data["timestamp_ms"]), # double not recognised as date
"timestamp": dict_data["timestamp_ms"],
"datetime": datetime.now(),
"message": dict_data["text"],
"hashtags": hashtags,
"polarity": tweet.sentiment.polarity,
"subjectivity": tweet.sentiment.subjectivity,
# handle geo data
#"coordinates": dict_data[coordinates],
"sentiment": sentiment})
return True
# on failure
def on_error(self, error):
print "error: " + str(error)
if __name__ == '__main__':
# create instance of the tweepy tweet stream listener
listener = TweetStreamListener()
# set twitter keys/tokens
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
while True:
try:
#create instance of the tweepy stream
stream = Stream(auth, listener)
# search twitter for sample tweets
stream.sample()
except KeyError:
pass
答案 0 :(得分:1)
好的,我找到了解决此问题的方法,将方法从on_data更改为on_status,删除了导致错误420的所有问题。