Question

我正在使用Twython twitter API来提取推文。但我只收到100条推文。我想从10Dec 2013到2014年3月10日提取推文。我在搜索功能中提到了count = 1000。

速率限制是100我得到的。有没有办法在给定的时间段内获得这些推文而没有任何速率限制。

 from twython import Twython
 import csv
 from dateutil import parser
 from dateutil.parser import parse as parse_date
 import datetime
 from datetime import datetime
 import pytz

 utc=pytz.UTC

 APP_KEY = 'xxxxxxxxxxx'    
 APP_SECRET = 'xxxxxxxxxxx'
 OAUTH_TOKEN = 'xxxxxxxx'  # Access Token here
 OAUTH_TOKEN_SECRET = 'xxxxxxxxxxx'  

 t = Twython(app_key=APP_KEY, app_secret=APP_SECRET, oauth_token=OAUTH_TOKEN,      oauth_token_secret=OAUTH_TOKEN_SECRET)

 search=t.search(q='AAPL', count="1000",since='2013-12-10')
 tweets= search['statuses']


 for tweet in tweets:
     do something

Answer 1

通过Search API访问推文时存在一个限制。看看这个Documentation。

Search API通常只会提供过去一周的推文。

当你试图从过去3/4个月检索推文时，你没有收到旧的推文。

Answer 2

使用Twython时，搜索API受到限制，但是仅使用get_user_timeline我就成功了。

我解决了一个类似的问题，我想从用户那里获取最后X条推文。

如果您阅读了文档，对我有用的窍门是跟踪我阅读的上一条推文的ID，并使用max_id直到下一个请求读到该推文。

对于您的情况，您只需要修改while循环即可在“ created_at”的某些情况下停止。这样的事情可能会起作用：

# Grab the first 200 tweets
last_id = 0
full_timeline = 200
result = t.get_user_timeline(screen_name='NAME', count = full_timeline)

for tweet in result:
    print(tweet['text'], tweet['created_at'])
    last_id = tweet['id']

# Update full timeline to see how many tweets were actually received
# Full timeline will be less than 200 if we read all the users tweets
full_timeline = len(result)

# 199 cause result[1:] is used to trim duplicated results cause of max_id
while full_timeline >= 199:
    result = t.get_user_timeline(screen_name='NAME', count = 200, max_id = last_id)

    # Since max_id is inclusive with its bound, it will repeat the same tweet we last read, so trim out that tweet
    result = result[1:]
    for tweet in result:
        print(tweet['text'], tweet['created_at'])
        last_id = tweet['id']

    # Update full_timeline to keep loop going if there are leftover teweets
    full_timeline = len(result)

Twython提取推文

2 个答案: