我在下面的代码中检索了Twitter Streaming数据并创建了一个JSON文件。我想要得到的是在1000条推文之后停止数据收集。我该如何设置代码?
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
# Other libs
import json
#Variables that contains the user credentials to access Twitter API
access_token = "XXX"
access_token_secret = "XXX"
consumer_key = "XXX"
consumer_secret = "XXX"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
try:
tweet = json.loads(data)
with open('your_data.json', 'a') as my_file:
json.dump(tweet, my_file)
except BaseException:
print('Error')
pass
def on_error(self, status):
print ("Error " + str(status))
if status == 420:
print("Rate Limited")
return False
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['Euro2016', 'FRA', 'POR'], languages=['en'])
答案 0 :(得分:1)
这是一个可能的解决方案:
tweet_number
在定义了类变量StdOutListener
之后,我使用 init ()方法初始化一个新的tweet_number
对象,其中包含您要收集的最大推文数。每次调用on_data(data)
方法时,tweet_number>=max_tweets
都会增加1,导致程序在sys
P.S。您需要导入ISO 8601
才能使代码生效。
答案 1 :(得分:0)
这是我要使用的2.7代码 - 对不起,我也不知道3.0 ...我想你想要的是我的第二行。 .items(1000)部分......?
stackoverflow在我的代码中弄乱了我的缩进。我也在使用tweepy。
results = []
for tweet in tweepy.Cursor(api.search, q='%INSERT_SEARCH_VARIABLE HERE').items(1000): #THE 1000 IS WHERE YOU SAY SEARCH FOR 1000 TWEETS.
results.append(tweet)
print type(results)
print len(results)
def toDataFrame(tweets):
DataSet = pd.DataFrame()
DataSet['tweetID'] = [tweet.id for tweet in tweets]
DataSet['tweetText'] = [tweet.text for tweet in tweets]
DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet
in tweets]
DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet
in tweets]
DataSet['tweetSource'] = [tweet.source for tweet in tweets]
DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
DataSet['userID'] = [tweet.user.id for tweet in tweets]
DataSet['userScreen'] = [tweet.user.screen_name for tweet
in tweets]
DataSet['userName'] = [tweet.user.name for tweet in tweets]
DataSet['userCreateDt'] = [tweet.user.created_at for tweet
in tweets]
DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet
in tweets]
DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet
in tweets]
DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
DataSet['userTimezone'] = [tweet.user.time_zone for tweet
in tweets]
return DataSet
#Pass the tweets list to the above function to create a DataFrame
tweet_frame = toDataFrame(results)
tweet_frame[0:999]