从R中的URL json获取数据

时间:2016-10-21 15:49:34

标签: json r

我正在使用R,我想从网址获取JSON信息,我有大约5000个用户代理发送到此API(http://www.useragentstring.com/pages/api.php

我使用此代码创建网址并连接用户代理:

url_1<-paste(" \"http://www.useragentstring.com/?uas=",uaelenchi[11,1],"&getJSON=all\"",sep = '');
json_data2<-fromJSON(readLines(cat(url_1)))

但是我收到了这个错误:

Error in readLines(cat(url_1)) : 'con' is not a connection

任何建议都会非常感激!感谢

2 个答案:

答案 0 :(得分:1)

我使用rjson::fromJSON(file = paste(your_url))。如果你做了一个可重复的例子,我可以检查它是否适用于你的情况。

答案 1 :(得分:0)

library(httr)
library(jsonlite)
library(purrr)

uas <- c("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0", 
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0", 
"Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0", 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11", 
"Mozilla/5.0 (X11; OpenBSD amd64; rv:28.0) Gecko/20100101 Firefox/28.0", 
"Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0", 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11", 
"Mozilla/5.0 (X11; OpenBSD amd64; rv:28.0) Gecko/20100101 Firefox/28.0", 
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120405 Firefox/14.0a1", 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A", 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36", 
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120405 Firefox/14.0a1", 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A", 
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36")

parse_uas <- function(uas) {
  res <- GET("http://www.useragentstring.com/", query=list(uas=uas, getJSON="all"))
  stop_for_status(res)
  content(res, as="text", encoding="UTF-8") %>% 
    fromJSON(res, flatten=TRUE) %>% 
    as.data.frame(stringsAsFactors=FALSE)
}

map_df(uas, parse_uas)

要保存API调用,您应该向parse_uas()函数添加一个缓存层,这可以通过memoise包很容易地完成:

library(memoise)

.parse_uas <- function(uas) {
  res <- GET("http://www.useragentstring.com/", query=list(uas=uas, getJSON="all"))
  stop_for_status(res)
  content(res, as="text", encoding="UTF-8") %>% 
    fromJSON(res, flatten=TRUE) %>% 
    as.data.frame(stringsAsFactors=FALSE)
}

parse_uas <- memoise(.parse_uas)

另外,如果您使用的是Linux,那么您也可以尝试this package(它在macOS上编译得不好,而在Windows IIRC上完全没编译),这将在本地进行所有处理。