这里的问题是某些XML文件在某些情况下不包含某些节点,例如示例代码中的“year”节点。 xpathApply
会直接忽略它,但是,我希望xmlValue
与NA
一起使用原始订单。看起来这与此post不相似。
xml_string = c(
'<?xml version="1.0" encoding="UTF-8"?>',
'<movies>',
'<movie mins="126" lang="eng">',
'<title>Good Will Hunting</title>',
'<director>',
'<first_name>Gus</first_name>',
'<last_name>Van Sant</last_name>',
'</director>',
'<year>1998</year>',
'<genre>drama</genre>',
'</movie>',
'<movie mins="106" lang="spa">',
'<title>Y tu mama tambien</title>',
'<director>',
'<first_name>Alfonso</first_name>',
'<last_name>Cuaron</last_name>',
'</director>',
'<genre>drama</genre>',
'</movie>',
'<movie mins="106" lang="spa">',
'<title>ABC</title>',
'<director>',
'<first_name>Alfonso</first_name>',
'<last_name>Cuaron</last_name>',
'</director>',
'<year>2001</year>',
'<genre>drama</genre>',
'</movie>',
'</movies>')
library(XML)
movies_xml = xmlParse(xml_string, asText = TRUE)
unlist(xpathApply(movies_xml, "//year", xmlValue))
结果是:
[1] "1998" "2001"
如何快速获得:
"1998" NA "2001"
答案 0 :(得分:2)
您可以编写一个函数来用NA替换丢失的节点并折叠多个节点。
xmlGetValue <- function(x, node){
a <- xpathSApply(x, node, xmlValue)
ifelse(length(a) == 0, NA,
ifelse(length(a) > 1, paste(a, collapse=", "), a))
}
xpathSApply(movies_xml, "//movie", xmlGetValue, "./year")
[1] "1998" NA "2001"
答案 1 :(得分:1)
您可以使用XPath boolean
测试每个父节点:
xpathSApply(movies_xml, "//movies/movie", function(x) {
if (xpathSApply(x, "boolean(./year)")) {
xpathSApply(x, "./year", xmlValue)
} else {
NA
}
})
## [1] "1998" NA "2001"
对于那些使用xml2
的用户,请按照以下步骤进行操作:
library(xml2)
doc <- read_xml(paste0(xml_string, collapse="\n"))
movies <- xml_find_all(doc, "//movies/movie")
sapply(movies, function(x) {
tryCatch(xml_text(xml_find_one(x, "./year")),
error=function(err) NA)
})
答案 2 :(得分:1)
考虑将电影节点将xml字符串传递给数据帧,并从年份列创建一个列表:
movies_xml = xmlParse(xml_string, asText = TRUE)
xmldf <-xmlToDataFrame(nodes = getNodeSet(movies_xml, "//movie"))
yearlist <- c(xmldf[3])
输出
$year
[1] "1998" NA "2001"