从“angular.callbacks”网络抓取数据

时间:2017-07-19 14:24:50

标签: json r angular web-crawler

我想使用R从网址抓取新闻(http://www.foxnews.com/search-results/search?q=“AlphaGo”& ss = fn& start = 0)。这是我的代码:

url <- "http://api.foxnews.com/v1/content/search?q=%22AlphaGo%22&fields=date,description,title,url,image,type,taxonomy&section.path=fnc&start=0&callback=angular.callbacks._0&cb=2017719162"
html <- str_c(readLines(url,encoding = "UTF-8"),collapse = "")
content_fox <- RJSONIO:: fromJSON(html)

然而,json无法理解为出现错误:

  

文件错误(con,“r”):无法打开连接

我注意到json从angular.callbacks._0开始,我认为这可能是问题所在。

知道怎么解决这个问题吗?

1 个答案:

答案 0 :(得分:0)

根据Parse JSONP with R中的答案,我用两个新代码调整了我的代码并且它有效:

url <- "http://api.foxnews.com/v1/content/search?q=%22AlphaGo%22&fields=date,description,title,url,image,type,taxonomy&section.path=fnc&start=0&callback=angular.callbacks._0&cb=2017719162"
html <- str_c(readLines(url,encoding = "UTF-8"),collapse = "")
html <- sub('[^\\{]*', '', html) # remove function name and opening parenthesis
html <- sub('\\)$', '', html) # remove closing parenthesis
content_fox <- RJSONIO:: fromJSON(html)