我正在使用json
抓取多个(1000)链接的内容。但是,某些链接不能以json格式工作,因此没有要刮擦的内容。因此,当找到这些链接之一时,我的代码停止工作。
我尝试使用TryCatch
来避免该错误,但似乎不起作用
library(jsonlite)
library(rvest)
lapply(links_jason[1:6], function(x) {
tryCatch(
{
json_data <- read_html(x) %>% html_text()%>%
jsonlite::fromJSON(.)%>%
select(1)
},
error = function(cond) return(NULL),
finally = print(x)
)
})
Debug location is approximate beacuse the source is not available
链接1、2和6正常工作。 3、4和5需要避免
> head(links_jason)
[1] "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/68077&_format=hal_json"
[2] "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/57833&_format=hal_json"
[3] "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56774&_format=hal_json"
[4] "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56748&_format=hal_json"
[5] "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56782&_format=hal_json"
[6] "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/64341&_format=hal_json"
我也曾尝试使用if语句,但没有结果。有人可以帮忙吗?谢谢!
答案 0 :(得分:1)
使用jsonlite直接读取并测试返回长度
library(jsonlite)
library(rvest)
library(magrittr)
links_jason <- c("https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/68077&_format=hal_json"
,"https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/57833&_format=hal_json"
, "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56774&_format=hal_json"
, "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56748&_format=hal_json"
, "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56782&_format=hal_json"
,"https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/64341&_format=hal_json")
lapply(links_jason[1:6], function(x) {
json_data <- jsonlite::read_json(x)
if(length(json_data)>0){
print(x)
}
}
或类似的东西
library(jsonlite)
library(rvest)
library(magrittr)
links_jason <- c("https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/68077&_format=hal_json"
,"https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/57833&_format=hal_json"
, "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56774&_format=hal_json"
, "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56748&_format=hal_json"
, "https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/56782&_format=hal_json"
,"https://lasillavacia.com/silla_llena_api/get?path=/contenido-nodo/64341&_format=hal_json")
lapply(links_jason[1:6], function(x) {
json_data <- jsonlite::read_json(x)
if(length(json_data)==0){
json_data <- NA}
else{
print('doing something with json_data')
}
})