Question

我收集了一些推特数据：

#connect to twitter API
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

#set radius and amount of requests
N=200  # tweets to request from each query
S=200  # radius in miles

lats=c(38.9,40.7)
lons=c(-77,-74)

roger=do.call(rbind,lapply(1:length(lats), function(i) searchTwitter('Roger+Federer',
                                                                lang="en",n=N,resultType="recent",
                                                              geocode=paste  (lats[i],lons[i],paste0(S,"mi"),sep=","))))

在此之后我完成了：

rogerlat=sapply(roger, function(x) as.numeric(x$getLatitude()))
rogerlat=sapply(rogerlat, function(z) ifelse(length(z)==0,NA,z))  

rogerlon=sapply(roger, function(x) as.numeric(x$getLongitude()))
rogerlon=sapply(rogerlon, function(z) ifelse(length(z)==0,NA,z))  

data=as.data.frame(cbind(lat=rogerlat,lon=rogerlon))

现在我想得到所有包含long和lat值的推文：

data=filter(data, !is.na(lat),!is.na(lon))
lonlat=select(data,lon,lat)

但是现在我只获得了NA值....对这里出了什么问题的想法？

Answer 1

如上所述Chris，searchTwitter不会返回推文的长度。你可以通过转到twitteR文档来看到这一点，该文档告诉我们它会返回一个status对象。

状态对象

向下滚动到状态对象，您可以看到包含11条信息，但是lat-long不是其中之一。但是，我们并没有完全丢失，因为返回了用户的屏幕名称。

如果我们查看用户对象，我们会看到用户的对象至少包含一个位置。

所以我可以考虑至少两种可能的解决方案，具体取决于您的用例。

解决方案1：提取用户的位置

# Search for recent Trump tweets #
tweets <- searchTwitter('Trump', lang="en",n=N,resultType="recent",
              geocode='38.9,-77,50mi')

# If you want, convert tweets to a data frame #
tweets.df <- twListToDF(tweets)

# Look up the users #
users <- lookupUsers(tweets.df$screenName)

# Convert users to a dataframe, look at their location#
users_df <- twListToDF(users)

table(users_df[1:10, 'location'])

                                       ❤ Texas  ❤ ALT.SEATTLE.INTERNET.UR.FACE 
                   2                            1                            1 
               Japan             Land of the Free                  New Orleans 
                   1                            1                            1 
  Springfield OR USA                United States                          USA 
                   1                            1                            1 

# Note that these will be the users' self-reported locations,
# so potentially they are not that useful

解决方案2：半径有限的多次搜索

另一种解决方案是进行一系列重复搜索，以小半径递增纬度和经度。这样，您可以相对确定用户是否接近您指定的位置。

Answer 2

不一定是答案，但更多的观察时间太长，无法发表评论：

首先，您应该查看如何输入地理编码数据的文档。使用twitteR：

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

#set radius and amount of requests
N=200  # tweets to request from each query
S=200  # radius in miles

地理数据的结构应该是这样的（lat，lon，radius）：

geo <- '40,-75,200km'

然后使用：

roger <- searchTwitter('Roger+Federer',lang="en",n=N,resultType="recent",geocode=geo)

然后，我会使用twListtoDF来过滤：

roger <- twListToDF(roger)

现在为您提供一个包含16个cols和200个观察值的data.frame（在上面设置）。

然后您可以使用：

进行过滤

setDT(roger) #from data.table
roger[latitude > 38.9 & latitude < 40.7 & longitude > -77 & longitude < -74]

那说（以及为什么这是一个观察与答案） - 看起来好像twitteR没有返回lat和lon（它在我返回的数据中都是NA） - 我认为这是保护个人用户的位置。

也就是说，调整半径确实会影响结果的数量，因此代码可以以某种方式访问地理数据。

Answer 3

假设下载了一些推文，有一些地理引用的推文和一些没有地理坐标的推文：

ActionListener

为简单起见，让我们在您的经度/纬度点之间模拟prod(dim(data)) > 1 & prod(dim(data)) != sum(is.na(data)) & any(is.na(data)) # TRUE。

data

可以通过删除缺少数据的10行来选择具有经度/纬度数据的行。

set.seed(123)
data <- data.frame(lon=runif(200, -77, -74), lat=runif(200, 38.9, 40.7))
data[sample(1:200, 10),] <- NA

最后一行替换代码的最后两行。但请注意，这仅在缺少的地理坐标存储为data2 <- data[-which(is.na(data[, 1])), c("lon", "lat")] nrow(data) - nrow(data2) # 10时才有效。

无法得到推文的纬度和经度值

3 个答案: