Question

我正在尝试删除一位用户的所有推文，但是当我下载的数据丢失时，其主题标签例如，该推文应该具有5个标签。但是我下载的数据显示如下：

b'RT @gcosma1: Fantastic opportunity! PhD Studentship: Energy Prediction in Buildings using Artificial Intelligence\nthe_url #\xe2\x80\xa6'

有人知道为什么会这样吗？它困扰了我很长时间，我找不到解决方案。这是我的代码：

import tweepy
import csv
import json

consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'

def get_all_tweets(screen_name):
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)

    all_the_tweets = []
    new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    all_the_tweets.extend(new_tweets)
    oldest_tweet = all_the_tweets[-1].id - 1

    t_no = 201
    while len(all_the_tweets) != t_no:
        new_tweets = api.user_timeline(screen_name=screen_name,count=200, max_id=oldest_tweet, tweet_mode="extended")
        t_no = len(all_the_tweets)
        all_the_tweets.extend(new_tweets)
        oldest_tweet = all_the_tweets[-1].id - 1
        print ('...%s tweets have been downloaded so far' % len(all_the_tweets))

    # transforming the tweets into a 2D array that will be used to populate the csv
    outtweets = [[tweet.id_str, tweet.created_at,
    tweet.text.encode('utf8')] for tweet in all_the_tweets]
    # writing to the csv file

    with open(screen_name + '_tweets.csv', 'w', encoding='utf8') as f:
        writer = csv.writer(f)
        writer.writerow(['id', 'created_at', 'text'])
        writer.writerows(outtweets)

if __name__ == '__main__':
    get_all_tweets(input("Enter the twitter handle of the person whose tweets you want to download:- "))

Answer 1

似乎只在转推中发生。原始推文的文本似乎包含所有标签。如果查看其原始推文，就会发现它是

Fantastic opportunity! PhD Studentship: Energy Prediction in Buildings using Artificial Intelligence\nthe_url #DeepLearning #MachineLearning #AI #DataScience #PhD the_url2'

因此，您可以执行以下操作

new_tweets = api.user_timeline(screen_name='gcosma1', count=200, tweet_mode="extended")

tweet_text = []
for tweet in new_tweets:

    #Check if it is a retweet. If yes, add the original tweet
    if hasattr(tweet, 'retweeted_status'):
        tweet_text.append(tweet.retweeted_status.full_text)
    else:
        tweet_text.append(tweet.full_text)

print(tweet_text)

下载推文，但主题标签丢失

1 个答案: