跳过错误并通过循环获取输出

时间:2018-03-07 12:32:10

标签: r

当我试图在代码下面运行时,我得到一个错误,因为“as.data.frame.default中的错误(x [[i]],可选= TRUE,stringsAsFactors = stringsAsFactors):不能强制类”“试试 - 错误“”到data.frame“

我正在使用try函数跳过不起作用的LINKS并继续循环但是没有发生。有人可以帮助我吗

base_url <- c("https://www.sec.gov/Archives/edgar/data/1409916/000162828017002570/exhibit211nobilishealthcor.htm",
              "https://www.sec.gov/Archives/edgar/data/1300317/000119312507128181/dex211.htm",
              "https://www.sec.gov/Archives/edgar/data/1453814/000145381417000063/subsidiariesoftheregistran.htm",
              "https://www.sec.gov/Archives/edgar/data/25743/000138713117001111/ex21-1.htm",
              "https://www.sec.gov/Archives/edgar/data/880631/000119312517065534/d280058dex211.htm",
              "https://www.sec.gov/Archives/edgar/data/1058290/000105829017000008/ctshexhibit21112312016.htm",
              "https://www.sec.gov/Archives/edgar/data/1031927/000141588916005383/ex21-1.htm",
              "https://www.sec.gov/Archives/edgar/data/1358071/000135807118000008/cxoexhibit211.htm",
              "https://www.sec.gov/Archives/edgar/data/904979/000090497918000006/exhibit211q4fy17listofsubs.htm",
              "https://www.sec.gov/Archives/edgar/data/41296/000094420901500099/dex21.txt",
              "https://www.sec.gov/Archives/edgar/data/808461/000080846117000024/gciexhibit21-1123116.htm",
              "https://www.sec.gov/Archives/edgar/data/1101026/000107878213000519/f10k123112_ex21.htm",
              "https://www.sec.gov/Archives/edgar/data/932372/000141588915000759/ex21-1.htm"
              )

  df <- lapply(base_url,function(u){
  try({

  html_obj <- read_html(u)
  draft_table <- html_nodes(html_obj,'table')
  cik <- substr(u,start = 41,stop = 47)
  draft1 <- html_table(draft_table,fill = TRUE)
  final <- c(cik,draft1)
  },silent = TRUE)
})


require(reshape2)
data <- melt(df)
data <- as.data.frame(data,row.names = NULL)
data <- data[,1:2]
names(data) <- c("CIK","Company")

data2 <- transform(data, CIK = na.locf(CIK ))

3 个答案:

答案 0 :(得分:0)

如果出现问题,

try不会让您跳过,而是会返回班级try-error的错误。

所以之后,您仍然可以添加以下内容:

check <- sapply(df, class) != "try-error"
df <- df[check]

或直接使用tryCatch

df <- lapply(base_url, function(u) {
  tryCatch({
    html_obj <- read_html(u)
    draft_table <- html_nodes(html_obj,'table')
    cik <- substr(u,start = 41,stop = 47)
    draft1 <- html_table(draft_table,fill = TRUE)
    final <- c(cik,draft1)
  }, error = function(x) NULL)
})

答案 1 :(得分:0)

你可以尝试这样的事情。

for(i in something)
{
  res <- try(expression_to_get_data)
  if(inherits(res, "try-error"))
  {
    #error handling code, maybe just skip this iteration using
    continue
  }
  #rest of iteration for case of no error
}

Source of solution

答案 2 :(得分:0)

您可以使用purrr的safely功能。它为每个url创建一个列表,其中包含来自以下函数的结果和错误消息(如果存在而不退出循环)。

library(tidyverse)

checklinks <- function(url) {
  cik <- url %>% 
    str_extract("[:digit:]+")
  table <- read_html(url) %>% 
    html_nodes("table") %>%
    html_table() %>% 
    bind_rows() %>% 
    na_if("") %>% 
    filter(rowMeans(is.na(.)) < 1) %>% 
    mutate(cik = cik) %>% 
    select(cik, everything())
  return(table)
}

final <- base_url %>% 
  map(safely(checklinks)) %>% 
  transpose() %>% 
  .$result %>% 
  bind_rows()