在R中查找Twitter粉丝

时间:2012-02-08 11:48:18

标签: r twitter

我想使用R(关注者> 100000)查找用户Twitter关注者的个人资料。尽管twitteR是一个很棒的软件包,但在处理高级别的关注者时会遇到问题,因为需要实现一个睡眠例程以避免超出速率限制。我是一个相对新手,想知道如何循环遍历关注者ID对象,以100个批次输入关注者ID(因为这是Twitter API一次可以处理的最大值)?

编辑:添加了代码     (推特)     库(plyr)     maxTwitterIds = 100     睡眠时间= 500#秒

user<-getUser("[username]")
followers<-zz$getFollowerIDs()
ids_matrix = matrix(zz, nrow = maxTwitterIds, ncol = length(zz) / maxTwitterIds)
followers<-zz$getFollowerIDs()
#note: for smaller lists of followers it is possible to use the command "lookupUsers(zz)     at this point
foll<-getTwitterInfoForListIds = function(id_list) {
    return(lapply(id_list, 

names <- sapply(foll,name)
sn<sapply(foll,screenName)
id<-sapply(foll,id)
verified<-sapply(foll,erified)
created<-sapply(foll,created)
statuses<-sapply(foll,statusesCount)
follower<-sapply(foll,followersCount)
friends<-sapply(foll,friendsCount)
favorites<-sapply(foll,favoritesCount)
location<-sapply(foll,location)
url<-sapply(foll,url)
description<-sapply(foll,description)
last_status<-sapply(foll,lastStatus)))
}
alldata = alply(, 2, function(id_set) {
    info = getTwitterInfoForListIds(id_set)
    Sys.sleep(sleeptime)   
    return(info)
})

2 个答案:

答案 0 :(得分:1)

这也可以使用更新的rtweet软件包来完成。

根据此处的示例:https://github.com/mkearney/rtweet

# Get followers 

# Retrieve a list of the accounts following a user.

## get user IDs of accounts following CNN 
cnn_flw <- get_followers("cnn", n = 75000)

# lookup data on those accounts 
cnn_flw_data <- lookup_users(cnn_flw$user_id) 

# Or if you really want ALL of their followers:
# how many total follows does cnn have? 
cnn <- lookup_users("cnn")
# get them all (this would take a little over 5 days) 
cnn_flw <- get_followers(   "cnn", n = cnn$followers_count, 
  retryonratelimit = TRUE )

答案 1 :(得分:0)

首先我要告诉我没有使用twitteR包。因此,我只能为您提供一些伪代码,告诉您如何执行此操作的结构。这应该让你开始。

library(plyr)

# Some constants
maxTwitterIds = 100
sleeptime = 1 # sec

# Get the id's of the twitter followers of person X    
ids = getTwitterFollowers("x") # I'll use ids = 1:1000
ids_matrix = matrix(ids, nrow = maxTwitterIds, 
                         ncol = length(ids) / maxTwitterIds)

getTwitterInfoForListIds = function(id_list) {
    return(lapply(id_list, getTwitterInfo))
}

# Find the information you need from each id
alldata = alply(ids_matrix, 2, function(id_set) {
    info = getTwitterInfoForListIds(id_set)
    Sys.sleep(sleeptime)   
    return(info)
})

也许您从中获得的数据结构需要一些抛光(它是一个嵌套列表),但没有关于您想要从Twitter帐户中提取的内容的信息很难说。