我正在尝试删除一位用户的所有推文,但是当我下载的数据丢失时,其主题标签 例如,该推文应该具有5个标签。但是我下载的数据显示如下:
b'RT @gcosma1: Fantastic opportunity! PhD Studentship: Energy Prediction in Buildings using Artificial Intelligence\nthe_url #\xe2\x80\xa6'
有人知道为什么会这样吗?它困扰了我很长时间,我找不到解决方案。 这是我的代码:
import tweepy
import csv
import json
consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'
def get_all_tweets(screen_name):
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
all_the_tweets = []
new_tweets = api.user_timeline(screen_name=screen_name, count=200)
all_the_tweets.extend(new_tweets)
oldest_tweet = all_the_tweets[-1].id - 1
t_no = 201
while len(all_the_tweets) != t_no:
new_tweets = api.user_timeline(screen_name=screen_name,count=200, max_id=oldest_tweet, tweet_mode="extended")
t_no = len(all_the_tweets)
all_the_tweets.extend(new_tweets)
oldest_tweet = all_the_tweets[-1].id - 1
print ('...%s tweets have been downloaded so far' % len(all_the_tweets))
# transforming the tweets into a 2D array that will be used to populate the csv
outtweets = [[tweet.id_str, tweet.created_at,
tweet.text.encode('utf8')] for tweet in all_the_tweets]
# writing to the csv file
with open(screen_name + '_tweets.csv', 'w', encoding='utf8') as f:
writer = csv.writer(f)
writer.writerow(['id', 'created_at', 'text'])
writer.writerows(outtweets)
if __name__ == '__main__':
get_all_tweets(input("Enter the twitter handle of the person whose tweets you want to download:- "))
答案 0 :(得分:0)
似乎只在转推中发生。原始推文的文本似乎包含所有标签。如果查看其原始推文,就会发现它是
Fantastic opportunity! PhD Studentship: Energy Prediction in Buildings using Artificial Intelligence\nthe_url #DeepLearning #MachineLearning #AI #DataScience #PhD the_url2'
因此,您可以执行以下操作
new_tweets = api.user_timeline(screen_name='gcosma1', count=200, tweet_mode="extended")
tweet_text = []
for tweet in new_tweets:
#Check if it is a retweet. If yes, add the original tweet
if hasattr(tweet, 'retweeted_status'):
tweet_text.append(tweet.retweeted_status.full_text)
else:
tweet_text.append(tweet.full_text)
print(tweet_text)