对于给定数量的用户,我试图收集超过Twitter的200条推文速率限制。
但是,我的代码仅在用户具有200条以下推文时填充数据框,而无法将来自200条以上推文的用户的值附加到数据框。
完整代码 IN:
import tweepy
import pandas as pd
import numpy as np
from datetime import timedelta
handles = ['@MrML16419203', '@d00tn00t']
consumerKey, consumerSecret, accessToken, accessTokenSecret = 'x', 'x', 'x', 'x'
authenticate = tweepy.OAuthHandler(consumerKey, consumerSecret)
authenticate.set_access_token(accessToken, accessTokenSecret)
api_twitter = tweepy.API(authenticate, wait_on_rate_limit=True)
total_tweets = []
def get_tweets(handle):
batch_count_for_tweet_downloads = 200
try:
alltweets = []
tweets = api_twitter.user_timeline(screen_name=handle,
count=batch_count_for_tweet_downloads,
exclude_replies=True,
include_rts=False,
lang="en",
tweet_mode="extended")
alltweets.extend(tweets)
oldest = alltweets[-1].id - 1
oldest_datetime = pd.to_datetime(str(pd.to_datetime(oldest))[:-10]).strftime("%Y-%m-%d %H:%M:%S")
print(f"Getting Tweets For " + handle + ", After: " + oldest_datetime)
while len(tweets) > 0:
tweets = api_twitter.user_timeline(screen_name=handle, count=batch_count_for_tweet_downloads, max_id=oldest)
alltweets.extend(tweets)
if len(alltweets) > 0:
oldest = alltweets[-1].id - 1
else:
pass
print("Count: " + f"...{len(alltweets)} " + handle + " Tweets Downloaded")
print('---Total Downloaded: ' + str(len(alltweets)) + ' for ' + handle + '---')
df = pd.DataFrame(data=[tweets.user.screen_name for tweets in alltweets], columns=['Handle'])
df['Tweets'] = np.array([tweets.full_text for tweets in alltweets])
df['Date'] = np.array([tweets.created_at - timedelta(hours=4) for tweets in alltweets])
df['Len'] = np.array([len(tweets.full_text) for tweets in alltweets])
df['Like_count'] = np.array([tweets.favorite_count for tweets in alltweets])
df['RT_count'] = np.array([tweets.retweet_count for tweets in alltweets])
total_tweets.extend(alltweets)
print("----------Total Tweets Extracted: {}".format(df.shape[0]) + "----------")
except:
pass
return df
df = pd.DataFrame()
for handle in handles:
df_new = get_tweets(handle)
df = pd.concat((df, df_new))
print(df)
OUT:
Handle Tweets Date Len Like_count RT_count
0 MrML16419203 132716 2020-09-02 02:18:28 6.0 0.0 0.0
1 MrML16419203 432881 2020-09-02 02:04:23 6.0 0.0 0.0
2 MrML16419203 973625 2020-09-02 02:04:09 6.0 0.0 0.0
3 MrML16419203 1234567 2020-09-02 01:55:10 7.0 0.0 0.0
4 MrML16419203 225865 2020-09-02 01:27:11 6.0 0.0 0.0
.. ... ... ... ... ... ...
536 d00tn00t NaN NaT NaN NaN NaN
537 d00tn00t NaN NaT NaN NaN NaN
538 d00tn00t NaN NaT NaN NaN NaN
539 d00tn00t NaN NaT NaN NaN NaN
540 d00tn00t NaN NaT NaN NaN NaN
您可以看到,即使我的控制台显示while循环正在下载这些数据点,拥有200条以上推文的任何用户仍会返回NaN和NaT值。
我尝试过多种解决方案(例如游标),但都没有用,并且在尝试仅从200条以上推文中提取推文时收到长度不匹配错误。这是因为返回的数据框为空(除了“句柄”列之外),并且在导出为CSV时可以观察到。
任何帮助将不胜感激。谢谢。