来自R的Twitter API上的用户查找导致错误(403)

时间:2014-06-27 09:30:19

标签: r twitter

使用Twitter API和twitteR - 包,我试图检索用户对象 查找一长串名称(介于50.000和100.000之间)。

我一直收到以下错误:

Error in twInterfaceObj$doAPICall(paste("users", "lookup", sep = "/"),  : 
  client error: (403) Forbidden

错误代码据说暗示"update limits"。 但rate limit on user lookups为180,查找以batches of 100个用户名执行。 因此,多达18.000个用户不应该成为问题。但是,即使每15分钟时间窗口将数量减少到6000(通过仅限应用程序验证请求的限制)也会导致相同的错误。

这是一个MWE(但是你需要your own API-keys):

library(plyr)
# install the latest versions from github:
# devtools::install_github("twitteR", username="geoffjentry")
# devtools::install_github("hadley/httr")
library(twitteR)
library(httr)    

source("TwitterKeys.R") # Your own API-Keys
setup_twitter_oauth(consumerKey, consumerSecret, accessToken, accessSecret)

# The following is just to generate a large enough list of user names:
searchTerms <- c("worldcup", "economy", "climate", "wimbledon", 
                 "apple", "android", "news", "politics")

# This might take a while
sample <- llply(searchTerms, function(term) {
  tweets <- twListToDF(searchTwitter(term, n=3200))
  users <- unique(tweets$screenName)
  return(users)
})

userNames <- unique(unlist(sample))

# This function is supposed to perform the lookups in batches 
# and mind the rate limit:
getUserObjects <- function(users) {
  groups <- split(users, ceiling(seq_along(users)/6000))
  userObjects <- ldply(groups, function(group) {
    objects <- lookupUsers(group)
    out <- twListToDF(objects)
    print("Waiting for 15 Minutes...")
    Sys.sleep(900)
    return(out)
  })
  return(userObjects)
}

# Putting it into action:
userObjects <- getUserObjects(userNames)

有时会手动查找较小的子集,例如通过lookupUsers(userNames[1:3000])工作;但是,当我尝试自动化该过程时,会抛出错误。

有没有人知道这可能是什么原因?

2 个答案:

答案 0 :(得分:0)

根据这个答案I hit the rate limit for twitteR even from the first request,不仅限制了用户总数,还限制了每15分钟间隔的呼叫数量。如果每个呼叫有100个用户,并且您正在尝试查找6000个用户,则应该拨打60个呼叫,这超过了您允许的15个呼叫。尝试将程序置于睡眠状态,让它在15分钟后再次发出呼叫。

答案 1 :(得分:0)

我知道这个问题已经过时了,但我最近遇到了这个问题,找不到能够充分解决问题的任何回答。

BOTTOM LINE UP FRONT:

添加tryCatch()错误处理系统并将调用拆分为两个较小的50个id的调用解决了这个问题。

LONG STORY

对我来说,我注意到API似乎在同一点失败(大约在第4,100个id)添加一些错误处理之后,我能够识别我的ID列表中的大约8个部分,这些部分不起作用。但是,当使用twitter API Console时,这些id工作正常。我浏览了github中的代码,但找不到它应该破解的原因。实验发现,将调用分成两部分完美无缺。这是一个有效的代码示例。

N <- NROW(Data)      # Keeps track of how many more id's we have
count <- 1           # Keeps track of which ID we are at
Len <- N             # so we don't index out of range (see below)
Stop <- 0            # Contains the value that we should Stop each batch at
j = 0                # Keeps Track of how many calls made
while (N > 0 && j <= 180) {

    tryCatch({

    # Set The Stop value so that if we hit the end of the list it doesn't
    # Give a value that is out of range
    Stop <<- min(c(count + 99, Len))

    # Keep track of how many calls we have made
    j = j + 1   
    User_Data <- lookupUsers(Data$user_id_str[count:Stop], includeNA = TRUE)

    #... CODE THAT STORES DATA AS NEEDED

    # Update for next iteration
    N <<- N - 100
    count <<- count + 100
    message(paste("Users Searched: ", (count-1), "/", Len))

    },

    error = function(e) {

      message("Twitter sent back 403 error, Trying again with half as many tweets")
      Stop <<- min(c(count + 49, Len))

      j <<- j + 1
      # FIRST SECOND TRY 
      User_Data <- lookupUsers(Data$user_id_str[count:Stop], includeNA = TRUE)

      #... CODE THAT STORES DATA AS NEEDED
      N <<- N - 50
      count <<- count + 50
      message(paste("Users Searched: ", Stop, "/", Len))

      Stop <<- min(c(count + 49, Len))

      j <<- j + 1
      # SECOND SECOND TRY
      User_Data <- lookupUsers(Freelancers$user_id_str[count:Stop], includeNA = TRUE)

      #... CODE THAT STORES DATA AS NEEDED
      N <<- N - 50
      count <<- count + 50
      message(paste("Users Searched: ", Stop, "/", Len))
    })

}