如何使用Tweepy多次调用Twitter API以使每个用户获得200条以上的推文?

时间:2020-02-09 06:54:55

标签: python pandas tweepy

我这里有一些Python代码,可以从每个美国民主党政治候选人的Twitter帐户中检索200条推文的最大限制。虽然,我将其设置为不回复,也没有转发,所以实际上返回的很少。我知道虽然您可以在15分钟的时间内拨打多个电话,特别是180次,但每次通话最多可以返回200条推文,这将返回更多推文。我的问题是如何拨打多个电话,同时仍以我目前设置的大熊猫df格式返回数据。谢谢!

import datetime as dt
import os
import pandas as pd
import tweepy as tw

#define developer's permissions
consumer_key = 'xxxxxxxx'
consumer_secret = 'xxxxxxxx'
access_token = 'xxxxxx'
access_token_secret = 'xxxxxxx'

#access twitter's API
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
#function collects tweets from 
def get_tweets(handle):
    try:
        tweets = api.user_timeline(screen_name=handle, 
                                   count=200,
                                   exclude_replies=True, 
                                   include_rts=False,
                                  tweet_mode="extended")
        print(handle, "Number of tweets extracted: {}\n".format(len(tweets)))
        df = pd.DataFrame(data=[tweet.user.screen_name for tweet in tweets], columns=['handle'])
        df['tweets'] = np.array([tweet.full_text for tweet in tweets])
        df['date'] = np.array([tweet.created_at for tweet in tweets])
        df['len'] = np.array([len(tweet.full_text) for tweet in tweets])
        df['like_count'] = np.array([tweet.favorite_count for tweet in tweets])
        df['rt_count'] = np.array([tweet.retweet_count for tweet in tweets])
    except:
        pass
    return df

#list of all the candidate twitter handles
handles = ['@JoeBiden', '@ewarren', '@BernieSanders', '@MikeBloomberg', '@PeteButtigieg', '@AndrewYang', '@AmyKlobuchar']
df = pd.DataFrame()
​
#loop through the diffent candidate twitter handles and collect each candidates tweets
for handle in handles:
    df_new = get_tweets(handle)
    df = pd.concat((df, df_new))

@JoeBiden Number of tweets extracted: 200.

@ewarren Number of tweets extracted: 200.

@BernieSanders Number of tweets extracted: 200.

@MikeBloomberg Number of tweets extracted: 200.

@PeteButtigieg Number of tweets extracted: 200.

@AndrewYang Number of tweets extracted: 200.

@AmyKlobuchar Number of tweets extracted: 200.

2 个答案:

答案 0 :(得分:0)

首先,您现在要重新生成凭据。

您可以使用Cursor遍历分页结果,也可以为API.user_timeline传递since_id和/或max_id参数。

另请参阅the documentation for the GET statuses/user_timeline endpoint

答案 1 :(得分:0)

Twitter API documentation解释了为什么您获得较低结果的原因:

exclude_replies -“此参数将阻止回复出现在返回的时间轴中。将exclude_replies与count参数一起使用将意味着您将收到最多的推文-这是因为count参数在过滤掉转发和回复之前先回收了那么多推文。”