我有一个从大约440个网页读取HTML表格数据的循环。每个页面上的代码不完全相同,所以有时我需要表节点1,有时我需要节点2.现在我只是在列表中手动设置节点号并将其输入循环。我的问题是页面节点已经开始更改和更新节点#list变得麻烦。
如果循环遇到错误的节点#(即:1而不是2,或反向),则会出错并关闭。如果遇到错误,有没有办法让循环将错误的节点号替换为正确的节点号,然后继续运行循环,好像什么也没发生?
这是我循环中代码的readHTML部分,带有示例url:
url <- "http://espn.go.com/nba/player/gamelog/_/id/2991280/year/2013/"
html.page <- htmlParse(url)
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Players$Nodes[s])
tbl = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
这是节点#错误时得到的错误:
“readHTMLTable中的错误(tableNodes [[x]],colClasses = c(”character“),stringsAsFactors = FALSE):在为函数'readHTMLTable'选择方法时评估参数'doc'时出错:tableNodes中的错误[[x]]:下标超出范围“
示例代码:
A <- c("dog", "cat")
Nodes <- as.data.frame(1:1)
#)Nodes <- as.data.frame(1:2) <-- This works without errors
colnames(Nodes)[1] <- "Col1"
Nodes2 <- 2
url <-c("http://espn.go.com/nba/player/gamelog/_/id/6639/year/2013/","http://espn.go.com/nba/player/gamelog/_/id/6630/year/2013/")
for (i in 1:length(A))
{
html.page <- htmlParse(url[i])
tableNodes <- getNodeSet(html.page, "//table")
x <- as.numeric(Nodes$Col1[i])
df = readHTMLTable(tableNodes[[x]], colClasses = c("character"),stringsAsFactors = FALSE)
#tryCatch(df) here.....no clue
assign(paste0("", A[i]), df)
}
答案 0 :(得分:3)
如果您收到subscript out of bounds
错误消息,那么您应该尝试使用较低的x
。基于您在原始问题中发布的演示代码tryCatch
的一般演示(虽然我已将x
替换为2
,因为我不知道Players
和{{s
1}}):
> msg <- tryCatch(readHTMLTable(tableNodes[[2]], colClasses = c("character"),stringsAsFactors = FALSE), error = function(e)e)
> str(msg)
List of 2
$ message: chr "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript"| __truncated__
$ call : language readHTMLTable(tableNodes[[2]], colClasses = c("character"), stringsAsFactors = FALSE)
- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
> msg$message
[1] "error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in tableNodes[[2]] : subscript out of bounds\n"
> grepl('subscript out of bounds', msg$message)
[1] TRUE