Question

我正在尝试使用tweepy从帐户列表中提取推文。我能够获得这些推文，但是我从一个帐户中获得了大量重复的推文。在某些情况下，我拉了400条推文，大约有一半重复。

我已经在Twitter上查看了这些帐户，并确认这些帐户不只是在一遍又一遍地发布相同的内容。我也已经确认，他们没有一百多条转发可能会对此造成影响。当我查看重复的实际tweet对象时，一切都完全相同。鸣叫ID是相同的。创建的时间是相同的。转推次数没有差异。 @mentions和主题标签是相同的。我没看到任何区别。我以为这可能是我循环中的某件事，但是我尝试的所有事情都会产生相同的结果。

有什么想法吗？我不想只是进行重复数据删除，因为这样一来，来自某些帐户的推文就会大大减少。

# A list of the accounts I want tweets from
friendslist = ["SomeAccount", "SomeOtherAccount"] 

# Where I store the tweet objects
friendstweets = []

# Loop that cycles through my list of accounts to add tweets to friendstweets
for f in friendslist:
    num_needed = 400 # The number of tweets I want from each account
    temp_list = []
    last_id = -1 # id of last tweet seen
    while len(temp_list) < num_needed:
        try:
          new_tweets = api.user_timeline(screen_name = f, count = 400, include_rts = True)
        except tweepy.TweepError as e:
            print("Error", e)
            break
        except StopIteration:
            break
        else:
            if not new_tweets:
              print("Could not find any more tweets!")
              break
        friendstweets.extend(new_tweets) 
        temp_list.extend(new_tweets)
        last_id = new_tweets[-1].id
    print('Friend '+f+' complete.')

Answer 1

您的问题出在以下行：while len(temp_list) < num_needed:。基本上，您要做的就是为每个用户获取相同的tweet，直到获取400条以上的tweet。

修正，我建议删除while循环并将获取的tweet计数从400更改为num_nneded：

new_tweets = api.user_timeline(screen_name = f, count = num_needed, include_rts = True)

希望它会按预期工作。

使用tweepy从用户时间轴获取重复的tweet

1 个答案: