使用trycatch函数跳过循环中的元素

时间:2017-12-17 18:46:36

标签: r rvest

我有一个我想要抓的网站列表,fe

review_links <- c("https://www.filmtotaal.nl/recensie/12882", "https://www.filmtotaal.nl/r")

在链接上我想执行以下功能:

read_txt <- function(a_review_link){
  read_html(review_link)
  txt <- pg %>% html_nodes(xpath = '//div[@class="text"]//text()') %>% 
  html_text %>% trimws %>% 
  grep('^[a-zA-Z]+:|\\|$|^[0-9]*$', ., 
   invert = TRUE, value = TRUE) %>% 
  paste(collapse = ' ')
}

然而,当我使用如下函数循环遍历列表时:

for(review_link in review_links){
  read_txt(review_link
}

我收到错误。因此我现在正在尝试一些错误处理。但是,当我这样做时:

for(review_link in review_links){
  tryCatch(read_txt(test_error), error=function(e) return ("No valid URL"))
}

我确实得到了我期望的输出(第二个链接应该弹出错误)。对这里出了什么问题的想法?

2 个答案:

答案 0 :(得分:1)

我查看了tryCatch的文档,这就是我想出来的。这是我第一次使用tryCatch。

review_links <- c("https://www.filmtotaal.nl/recensie/12882", "https://www.filmtotaal.nl/r")
read_txt <- function(a_review_link){ 
  tryCatch( pg <- read_html(a_review_link),  error = function(e) e, {
    txt <-      
      pg %>%    
        html_nodes(xpath = '//div[@class="text"]//text()')  %>%    
        html_text %>%    
        trimws %>%     
        grep('^[a-zA-Z]+:|\\|$|^[0-9]*$', .,invert = TRUE, value = TRUE) %>%     
        paste(collapse = ' ')       
  })  
}  
for(review_link in review_links){
  print(read_txt(review_link))
}

答案 1 :(得分:0)

此代码在我的R上正确运行:

library(rvest)

review_links <- c("https://www.filmtotaal.nl/recensie/12882",
                  "https://www.filmtotaal.nl/recensie/12883")
read_txt <- function(a_review_link) {
  pg <- read_html(review_link)
  txt <- pg %>% html_nodes(xpath = '//div[@class="text"]//text()') %>% 
  html_text %>% trimws %>% 
  grep('^[a-zA-Z]+:|\\|$|^[0-9]*$', ., invert = TRUE, value = TRUE) %>% 
  paste(collapse = ' ')
}

lst <- vector(length(review_links), mode="list")
k <- 1
for(review_link in review_links) {
  lst[[k]] <- read_txt(review_links)
  k <- k+1
}

lst[[1]]
# [1] "Cast : Het Hongaarse lichtabsurdistische liefdesdrama On Body and Soul sleepte ...

lst[[2]]
# [1] "Cast : Janet heeft er hard voor geknokt, maar nu het gelukt is mag ze het ...