我正在尝试在包含800万个Twitter user_id的数据集上运行lookup_users。但是,Twitter的速率限制不允许我获取超过90,000个用户ID的数据。
我使用以下代码,但效果不佳:
data <- vector("list", length(dataset$user_id)) # my dataset has a column called user_id
for (i in seq_along(dataset)) {
data[[i]] <-
lookup_users(dataset$user_id[i])
Sys.sleep(15*60) #900 seconds or 15 minitues sleep
}
我在这里找到了尼尔查尔斯的答案:https://github.com/ropensci/rtweet/issues/118
“我在 在90,000个限制附近偷懒搜寻。我下面的解决方案 可能不是最优雅,但对我有用。
lookup_many_users <- function(users, twitter_token, retry_limit = 5){
require(rtweet)
breaks <- seq(1, length(users), 89999)
if(breaks[length(breaks)] != length(users)){
breaks <- c(breaks, length(users))
}
user_details <- NULL
for(i in 1:(length(breaks) -1)){
attempt <- 0
while(is.null(user_details) && attempt <= retry_limit){
attempt <- attempt + 1
try({
user_details <- lookup_users(users[breaks[i]:breaks[i+1]], token = twitter_token)
Sys.sleep(15 * 60) #wait 15 minutes for rate limit to reset before proceeding
})
}
if(is.null(user_details)){
stop("failed to get users")
}
if(i == 1){
all_user_details <- user_details
} else {
all_user_details <- rbind(all_user_details, user_details)
}
user_details <- NULL
}
return(all_user_details)
}