我想使用R(关注者> 100000)查找用户Twitter关注者的个人资料。尽管twitteR是一个很棒的软件包,但在处理高级别的关注者时会遇到问题,因为需要实现一个睡眠例程以避免超出速率限制。我是一个相对新手,想知道如何循环遍历关注者ID对象,以100个批次输入关注者ID(因为这是Twitter API一次可以处理的最大值)?
编辑:添加了代码 (推特) 库(plyr) maxTwitterIds = 100 睡眠时间= 500#秒
user<-getUser("[username]")
followers<-zz$getFollowerIDs()
ids_matrix = matrix(zz, nrow = maxTwitterIds, ncol = length(zz) / maxTwitterIds)
followers<-zz$getFollowerIDs()
#note: for smaller lists of followers it is possible to use the command "lookupUsers(zz) at this point
foll<-getTwitterInfoForListIds = function(id_list) {
return(lapply(id_list,
names <- sapply(foll,name)
sn<sapply(foll,screenName)
id<-sapply(foll,id)
verified<-sapply(foll,erified)
created<-sapply(foll,created)
statuses<-sapply(foll,statusesCount)
follower<-sapply(foll,followersCount)
friends<-sapply(foll,friendsCount)
favorites<-sapply(foll,favoritesCount)
location<-sapply(foll,location)
url<-sapply(foll,url)
description<-sapply(foll,description)
last_status<-sapply(foll,lastStatus)))
}
alldata = alply(, 2, function(id_set) {
info = getTwitterInfoForListIds(id_set)
Sys.sleep(sleeptime)
return(info)
})
答案 0 :(得分:1)
这也可以使用更新的rtweet
软件包来完成。
根据此处的示例:https://github.com/mkearney/rtweet
# Get followers
# Retrieve a list of the accounts following a user.
## get user IDs of accounts following CNN
cnn_flw <- get_followers("cnn", n = 75000)
# lookup data on those accounts
cnn_flw_data <- lookup_users(cnn_flw$user_id)
# Or if you really want ALL of their followers:
# how many total follows does cnn have?
cnn <- lookup_users("cnn")
# get them all (this would take a little over 5 days)
cnn_flw <- get_followers( "cnn", n = cnn$followers_count,
retryonratelimit = TRUE )
答案 1 :(得分:0)
首先我要告诉我没有使用twitteR包。因此,我只能为您提供一些伪代码,告诉您如何执行此操作的结构。这应该让你开始。
library(plyr)
# Some constants
maxTwitterIds = 100
sleeptime = 1 # sec
# Get the id's of the twitter followers of person X
ids = getTwitterFollowers("x") # I'll use ids = 1:1000
ids_matrix = matrix(ids, nrow = maxTwitterIds,
ncol = length(ids) / maxTwitterIds)
getTwitterInfoForListIds = function(id_list) {
return(lapply(id_list, getTwitterInfo))
}
# Find the information you need from each id
alldata = alply(ids_matrix, 2, function(id_set) {
info = getTwitterInfoForListIds(id_set)
Sys.sleep(sleeptime)
return(info)
})
也许您从中获得的数据结构需要一些抛光(它是一个嵌套列表),但没有关于您想要从Twitter帐户中提取的内容的信息很难说。