我正在扩展关于html解析的previous question以包含有关空值的问题。假设我从HTML中提取的某些变量有空值。有多个变量可能是空的,所以我想要一个系统的方法来处理它们(循环或函数)。
这个问题实际上是关于以编程方式分配变量,我发现的大部分信息都建议避免使用eval(parse(text
,但我不确定在这种情况下如何替换它。我有以下HTML:
html <-
'<!DOCTYPE html>
<html>
<body>
<div class="foo">
<div class="fooname">Name of 1st foo</div>
<div class="abc">ABC value only present here</div>
<span>1st span in 1st foo</span>
<span>2nd span in 1st foo</span>
</div>
<div class="foo">
<div class="fooname">Name of 2nd foo</div>
<span>Only 1 span in 2nd foo</span>
</div>
</body>
</html>'
以下是解析:
library(XML)
html.parse <- htmlParse(html)
myFunc <- function(x){
fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)
span <- xpathSApply(x, "./span", fun = xmlValue)
df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
return(df)
}
result <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)
# Error in data.frame(fooname, abc, Span1 = span[1], Span2 = span[2]) :
# arguments imply differing number of rows: 1, 0
这是我的尝试修复。
myFunc <- function(x){
fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)
span <- xpathSApply(x, "./span", fun = xmlValue)
dfvars <- c("fooname", "abc", "span")
#I think I have the same issue about assigning a variable in `apply`
#functions, right?
for(var in dfvars) {
if(length(eval(parse(text = var))) == 0) {
cat("No ", var, " value found for this group.\n")
#Note the "list" class:
cat("Class of ", var, " is: ", class(eval(parse(text = var))), "\n")
cat("Placing an NA.\n")
#This line gives an error:
assign(eval(parse(text = var)), as.character(NA))
cat("new value of ", var, " : ", eval(parse(text = var)), "\n")
cat("New length of ", var, " : ", length(eval(parse(text = var))), "\n")
cat("New class of ", var, " : ", class(eval(parse(text = var))), "\n")
}
}
df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
return(df)
}
result <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)
# Error in assign(eval(parse(text = var)), as.character(NA)) :
# invalid first argument
请注意,虽然这里for
循环(或apply
函数,如果我这样做)是在第二个嵌套层。在我的真实项目中,它排在第三;外层在一系列页面中打开。如果可能的话,尽量避免进入第三级会很好,但我也想让事情变得简单。
答案 0 :(得分:1)
您可以定义自己的xpathSApply
函数来测试list()
:
myXpathSApply <- function(x, ...){
y <- xpathSApply(x, ...)
if(length(y) > 0){y}else{NA}
}
并在使用xpathSApply
:
myFunc <- function(x){
fooname <- myXpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
abc <- myXpathSApply(x, "./div[@class='abc']", fun = xmlValue)
span <- myXpathSApply(x, "./span", fun = xmlValue)
df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
return(df)
}