我在R中使用XPath,并且具有这样的XML结构:
library(XML)
xml1 <- xmlParse('
<L0>
<L1>
<ID>Get this ID</ID>
<L1N1>Ignore node 1</L1N1>
<L1N2>
<L2>
<L2N1>Get this node and all others in L2</L2N1>
</L2>
</L1N2>
<L1N3>Ignore node 3</L1N3>
</L1>
<L1>
<ID>Get this ID</ID>
<L1N1>Ignore node 1</L1N1>
<L1N2>
<L2>
<L2N1>Get this node and all others in L2</L2N1>
</L2>
</L1N2>
<L1N4>Ignore node 4</L1N4>
</L1>
<L1>
<ID>Ignore this ID</ID>
<L1N1>Ignore node 1</L1N1>
<L1N3>Ignore node 3</L1N3>
<L1N4>Ignore node 4</L1N4>
</L1>
</L0>
')
我想提取每个L2
节点和一个叔叔节点(例如ID
),但不提取其他叔叔。每个提取的结果应该返回到祖父节点L1
。这是期望的输出:
## [[1]]
## <L1>
## <ID>Get this ID</ID>
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## </L1>
## [[2]]
## <L1>
## <ID>Get this ID</ID>
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## </L1>
我可以获得包含L1
后代的L2
个节点:
getNodeSet(xml1, "//L1[descendant::L2]")
## [[1]]
## <L1>
## <ID>Get this ID</ID>
## <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## <L1N3>Ignore node 3</L1N3> ## *Want to exclude this*
## </L1>
##
## [[2]]
## <L1>
## <ID>Get this ID</ID>
## <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## <L1N4>Ignore node 4</L1N4> ## *Want to exclude this*
## </L1>
......但这包括我不想要的叔叔。我可以排除那些叔叔并选择我想要的L1
子节点:
getNodeSet(xml1, "//L1/*[self::ID | child::L2]")
## [[1]]
## <ID>Get this ID</ID>
##
## [[2]]
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
##
## [[3]]
## <ID>Get this ID</ID>
##
## [[4]]
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
##
## [[5]]
## <ID>Ignore this ID</ID>
...但现在ID
和L2
是分开的,而不是L1
,它还包括来自第三个L1
节点的元素没有L2
。
XPath可以返回所需的结果吗?如果没有,我可以在R中使用其他东西来实现结果吗?
答案 0 :(得分:1)
这似乎做你想要的(使用你的xml1
):
trim <- function(node) {
names <- names(node)
to.remove <- names[!(names %in% c("ID","L1N2"))]
removeChildren(node,kids=to.remove)
}
lapply(xml1["//L1[descendant::L2]"],trim)
# [[1]]
# <L1>
# <ID>Get this ID</ID>
# <L1N2>
# <L2>
# <L2N1>Get this node and all others in L2</L2N1>
# </L2>
# </L1N2>
# </L1>
#
# [[2]]
# <L1>
# <ID>Get this ID</ID>
# <L1N2>
# <L2>
# <L2N1>Get this node and all others in L2</L2N1>
# </L2>
# </L1N2>
# </L1>
当然你可以使用匿名函数并把它放在一行:
lapply(xml1["//L1[descendant::L2]"],function(node) removeChildren(node,kids=names(node)[!(names(node)%in%c("ID","L1N2"))]))