通过Twitter API获取所有推文,而不仅仅是最近的推文(使用twitter4j - Java)

时间:2013-12-08 04:35:59

标签: api twitter twitter4j

我使用twitter4j构建了一个应用程序,当我输入关键字时,它会输入一堆推文,从推文中取出地理位置(或回落到配置文件位置),然后使用ammaps映射它们。问题是我只收到一小部分推文,这里有某种限制吗?我有一个数据库正在收集推文数据这么快就会有相当数量,但我很好奇为什么我在过去12小时左右才收到推文?

例如,如果我使用我的用户名搜索,我只会收到一条推文,我今天发送了一条推文。

感谢您的任何信息!

编辑:我知道Twitter不允许公众访问firehose ..更多为什么我只能查找最近的推文?

2 个答案:

答案 0 :(得分:3)

你需要不断重做查询,每次都重置maxId,直到你什么也得不回来。您还可以使用setSince和setUntil。

一个例子:

Query query = new Query();
query.setCount(DEFAULT_QUERY_COUNT);
query.setLang("en");
// set the bounding dates 
query.setSince(sdf.format(startDate));
query.setUntil(sdf.format(endDate));

QueryResult result = searchWithRetry(twitter, query); // searchWithRetry is my function that deals with rate limits

while (result.getTweets().size() != 0) {

    List<Status> tweets = result.getTweets();
    System.out.print("# Tweets:\t" + tweets.size());
    Long minId = Long.MAX_VALUE;

    for (Status tweet : tweets) {
    // do stuff here            
        if (tweet.getId() < minId)
        minId = tweet.getId();
    }
    query.setMaxId(minId-1);
    result = searchWithRetry(twitter, query);

}

答案 1 :(得分:1)

Really it depend on which API system you are using. I mean Streaming or Search API. In the search API there is a parameter (result_type) that is an optional parameter. The values of this parameter might be followings:

  * mixed: Include both popular and real time results in the response.
  * recent: return only the most recent results in the response
  * popular: return only the most popular results in the response.

The default one is the mixed one.

As far as I understand, you are using the recent one, that is why; you are getting the recent set of tweets. Another issue is getting low volume of tweets that have the geological information. Because there are very few users added the geological information to their profile, you are getting very few tweets.