Question

在API上返回搜索结果时，Twitter每个“页面”只返回100条推文。它们在返回的max_id中提供since_id和search_metadata，可用作参数以获取之前/之后的推文。

Twython 3.1.2文档表明这种模式是搜索的“老方法”：

results = twitter.search(q="xbox",count=423,max_id=421482533256044543)
for tweet in results['statuses']:
    ... do something

并且这是“new way”：

results = twitter.cursor(t.search,q='xbox',count=375)
for tweet in results:
    ... do something

当我执行后者时，它似乎无休止地迭代相同的搜索结果。我正在尝试将它们推送到CSV文件，但它会推动大量重复。

使用Twython搜索大量推文的正确方法是什么，并迭代一组独特的结果？

编辑：这里的另一个问题是，当我尝试使用生成器（for tweet in results:）进行迭代时，它会反复循环，而不会停止。啊 - 这是一个错误... https://github.com/ryanmcgrath/twython/issues/300

Answer 1

我遇到了同样的问题，但似乎你应该使用max_id参数批量遍历用户的时间线。根据Terence的答案，批次应该是100（但实际上，对于user_timeline 200是最大计数），只需将max_id设置为上一组返回的推文中的最后一个减1（因为max_id是包含的）。这是代码：

'''
Get all tweets from a given user.
Batch size of 200 is the max for user_timeline.
'''
from twython import Twython, TwythonError
tweets = []
# Requires Authentication as of Twitter API v1.1
twitter = Twython(PUT YOUR TWITTER KEYS HERE!)
try:
    user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200)
except TwythonError as e:
    print e
print len(user_timeline)
for tweet in user_timeline:
    # Add whatever you want from the tweet, here we just add the text
    tweets.append(tweet['text'])
# Count could be less than 200, see:
# https://dev.twitter.com/discussions/7513
while len(user_timeline) != 0: 
    try:
        user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200,max_id=user_timeline[len(user_timeline)-1]['id']-1)
    except TwythonError as e:
        print e
    print len(user_timeline)
    for tweet in user_timeline:
        # Add whatever you want from the tweet, here we just add the text
        tweets.append(tweet['text'])
# Number of tweets the user has made
print len(tweets)

Answer 2

根据official Twitter API documentation。

计算可选

每页返回的推文数量，最多为100

Answer 3

您需要重复调用python方法。但是，无法保证这些将是下一个N，或者如果推文真的进入它可能会错过一些。

如果你想在一个时间范围内发布所有推文，你可以使用流API：https://dev.twitter.com/docs/streaming-apis并将其与oauth2模块结合起来。

How can I consume tweets from Twitter's streaming api and store them in mongodb

python-twitter streaming api support/example

免责声明：我实际上没有尝试过这个

Answer 4

作为使用Twython为搜索查询返回100条推文的问题的解决方案，这里是显示如何使用“旧方式”完成它的链接：

Twython search API with next_results

如何使用Twython返回100多个Twitter搜索结果？

4 个答案: