使用xpathSApply提取现有节点但NA丢失?

时间:2018-01-25 08:23:34

标签: r xml xpathsapply

我有以下xml:

parsed <- 
<div class="Matches">
<div class="Match">
<div class="MatchType">Singles Match</div>
<div class="MatchResults">
<a href="?id=2&amp;nr=11408&amp;name=Jason+Jordan">Jason Jordan</a> (w/<a href="?id=2&amp;nr=2250&amp;name=Seth+Rollins">Seth Rollins</a>) defeats <a href="?id=2&amp;nr=257&amp;name=Cesaro">Cesaro</a> (w/<a href="?id=2&amp;nr=2641&amp;name=Sheamus">Sheamus</a>) (13:15)</div>
</div>
<div class="Match">
<div class="MatchRecommended">[<span class="TextHighlight"><a href="?id=111&amp;nr=9099">Recommended, Meltzer: ***3/4, CAGEMATCH users: <span class=" Rating Color7">7.17</span></a></span>]</div>
<div class="MatchType">
<a href="?id=5&amp;nr=16">WWE Intercontinental Title</a> Match</div>
<div class="MatchResults">
<a href="?id=2&amp;nr=9967&amp;name=Roman+Reigns">Roman Reigns</a> (c) defeats <a href="?id=2&amp;nr=676&amp;name=Samoa+Joe">Samoa Joe</a> (24:50)            </div>

我正在尝试拉出“MatchRecommended”类的部分,并为那些没有“MatchRecommended”类的孩子列出“NA”。

我认为我必须使用xpathSApply和xmlChildren来提取相关数据,但是使用下面的代码,我只能获得NAs:

xpathSApply(parsed, "//*[(@class = 'Match')]", function(x) ifelse(is.null(xmlChildren(x)$a), NA, xmlAttrs(xmlChildren(x)$a, 'href')))
[1] NA NA NA NA NA NA NA

理想情况下,结果如下:

[1] NA "Recommended, Meltzer: ***3/4, CAGEMATCH users: 7.17"

有关如何做到这一点的任何想法?

1 个答案:

答案 0 :(得分:0)

我会获得Match节点,然后查询节点集使用前导&#34;。&#34;所以它相对于当前节点。

parsed <- xmlParse('<div...rest of your XML plus two missing div tags')
nodes <- getNodeSet(parsed, "//div[(@class = 'Match')]")
x <- lapply(nodes, xpathSApply, ".//div[(@class = 'MatchRecommended')]", xmlValue, trim=TRUE)
x

[[1]]
list()

[[2]]
[1] "[Recommended, Meltzer: ***3/4, CAGEMATCH users: 7.17]"

有几种方法可以用NA替换该空列表。

sapply(x, function(y) ifelse(length(y)==0, NA, y))
[1] NA  "[Recommended, Meltzer: ***3/4, CAGEMATCH users: 7.17]"

您也可以使用xml2包,因为它会返回NAs而不是空列表。

library(xml2)
parsed <- read_xml('<div...')
nodes <-  xml_find_all(parsed, "//div[(@class = 'Match')]")  
sapply(nodes, function(x) xml_text( xml_find_first(x, ".//div[(@class = 'MatchRecommended')]"), trim=TRUE))