Question

我正试图在一个月左右的时间内从Twitter获取数据用于项目。使用此主题标签在这段时间内有超过10000条推文，但我似乎只是从当天获得了所有推文。昨天我68岁，今天80岁;两者都是当天的时间戳。

<?php
 $path = 'data.txt';
 if (isset($_POST['field1']) && isset($_POST['field2'])) {
    $fh = fopen($path,"a+");
    $string = $_POST['field1'].' - '.$_POST['field2'];
    fwrite($fh,$string); // Write information to the file
    fclose($fh); // Close the file
 }
?>

我确信应该有超过80条推文。我听说Twitter的速率限制一次只有1500条推文，但它是否也限制到某一天？请注意，我还尝试了使用

的

api = tweepy.API(auth)
igsjc_tweets = api.search(q="#igsjc", since='2014-12-31', count=100000)

ipdb> len(igsjc_tweets)
80

方法

Cursor

这也只给我80条推文。有关如何获取完整数据的任何提示或建议将不胜感激。

Answer 1

这是official tweepy tutorial on Cursor。注意：您需要遍历Cursor，如下所示。此外，还有一个可以通过.items()的最大计数，所以在两次调用之间逐个或类似地提取sleep可能是一个好主意。 HTH！

igsjc_tweets_jan = [tweet for tweet in tweepy.Cursor(
                    api.search, q="#igsjc", since='2016-01-01', until='2016-01-31').items(1000)]

Answer 2

首先，tweepy不能使用其搜索API带来太旧的数据我不知道确切的限制，但可能只有一两个回来。

反正你可以使用这段代码来获取推文。为了得到过去几天的推文，我运行它，它对我有用。

请注意，您可以对其进行优化并添加地理编码信息 - 我留下了一个为您注释的示例

flag = True
last_id = None
while (flag):
   flag = False
   for status in tweepy.Cursor(api.search,
                          #q='geocode:"37.781157,-122.398720,1mi" since:'+since+' until:'+until+' include:retweets',

                          q="#igsjc",
                          since='2015-12-31',

                          max_id=last_id,
                          result_type='recent',
                          include_entities=True,
                          monitor_rate_limit=False, 
                          wait_on_rate_limit=False).items(300):
       tweet = status._json
       print(Tweet)

       flag = True # there still some more data to collect
       last_id = status.id # for next time

祝你好运

twitter API限制推文到一天，tweepy

2 个答案: