当我试图在代码下面运行时,我得到一个错误,因为“as.data.frame.default中的错误(x [[i]],可选= TRUE,stringsAsFactors = stringsAsFactors):不能强制类”“试试 - 错误“”到data.frame“
我正在使用try函数跳过不起作用的LINKS并继续循环但是没有发生。有人可以帮助我吗
base_url <- c("https://www.sec.gov/Archives/edgar/data/1409916/000162828017002570/exhibit211nobilishealthcor.htm",
"https://www.sec.gov/Archives/edgar/data/1300317/000119312507128181/dex211.htm",
"https://www.sec.gov/Archives/edgar/data/1453814/000145381417000063/subsidiariesoftheregistran.htm",
"https://www.sec.gov/Archives/edgar/data/25743/000138713117001111/ex21-1.htm",
"https://www.sec.gov/Archives/edgar/data/880631/000119312517065534/d280058dex211.htm",
"https://www.sec.gov/Archives/edgar/data/1058290/000105829017000008/ctshexhibit21112312016.htm",
"https://www.sec.gov/Archives/edgar/data/1031927/000141588916005383/ex21-1.htm",
"https://www.sec.gov/Archives/edgar/data/1358071/000135807118000008/cxoexhibit211.htm",
"https://www.sec.gov/Archives/edgar/data/904979/000090497918000006/exhibit211q4fy17listofsubs.htm",
"https://www.sec.gov/Archives/edgar/data/41296/000094420901500099/dex21.txt",
"https://www.sec.gov/Archives/edgar/data/808461/000080846117000024/gciexhibit21-1123116.htm",
"https://www.sec.gov/Archives/edgar/data/1101026/000107878213000519/f10k123112_ex21.htm",
"https://www.sec.gov/Archives/edgar/data/932372/000141588915000759/ex21-1.htm"
)
df <- lapply(base_url,function(u){
try({
html_obj <- read_html(u)
draft_table <- html_nodes(html_obj,'table')
cik <- substr(u,start = 41,stop = 47)
draft1 <- html_table(draft_table,fill = TRUE)
final <- c(cik,draft1)
},silent = TRUE)
})
require(reshape2)
data <- melt(df)
data <- as.data.frame(data,row.names = NULL)
data <- data[,1:2]
names(data) <- c("CIK","Company")
data2 <- transform(data, CIK = na.locf(CIK ))
答案 0 :(得分:0)
try
不会让您跳过,而是会返回班级try-error
的错误。
所以之后,您仍然可以添加以下内容:
check <- sapply(df, class) != "try-error"
df <- df[check]
或直接使用tryCatch
:
df <- lapply(base_url, function(u) {
tryCatch({
html_obj <- read_html(u)
draft_table <- html_nodes(html_obj,'table')
cik <- substr(u,start = 41,stop = 47)
draft1 <- html_table(draft_table,fill = TRUE)
final <- c(cik,draft1)
}, error = function(x) NULL)
})
答案 1 :(得分:0)
你可以尝试这样的事情。
for(i in something)
{
res <- try(expression_to_get_data)
if(inherits(res, "try-error"))
{
#error handling code, maybe just skip this iteration using
continue
}
#rest of iteration for case of no error
}
答案 2 :(得分:0)
您可以使用purrr的safely
功能。它为每个url创建一个列表,其中包含来自以下函数的结果和错误消息(如果存在而不退出循环)。
library(tidyverse)
checklinks <- function(url) {
cik <- url %>%
str_extract("[:digit:]+")
table <- read_html(url) %>%
html_nodes("table") %>%
html_table() %>%
bind_rows() %>%
na_if("") %>%
filter(rowMeans(is.na(.)) < 1) %>%
mutate(cik = cik) %>%
select(cik, everything())
return(table)
}
final <- base_url %>%
map(safely(checklinks)) %>%
transpose() %>%
.$result %>%
bind_rows()