Question

我想从XML文档中提取数据。该文档有很多代，许多兄弟姐妹具有相同的主标记名称，但具有不同的辅助标记名称。我不知道如何在xml_find_all()中引用这些辅助标记名称。（或如何在xml_path()中引用它们。）

下面将说明我想做的事的MWE，最后一节中将出现错误：

library(xml2)
library(tidyverse)

x <- read_xml("<big><foo id = 'a'>Sailboat<baz>Battleship</baz></foo><foo id = 'b'>Aircraft Carrier</foo></big>")
x %>% xml_contents
#> {xml_nodeset (2)}
#> [1] <foo id="a">Sailboat<baz>Battleship</baz></foo>
#> [2] <foo id="b">Aircraft Carrier</foo>

x %>% xml_structure
#> <big>
#>   <foo [id]>
#>     {text}
#>     <baz>
#>       {text}
#>   <foo [id]>
#>     {text}
x %>% xml_find_all("foo") %>% xml_name
#> [1] "foo" "foo"

x %>% xml_find_all("foo")
#> {xml_nodeset (2)}
#> [1] <foo id="a">Sailboat<baz>Battleship</baz></foo>
#> [2] <foo id="b">Aircraft Carrier</foo>

x %>% xml_find_all("foo id='b'")
#> Warning in xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns,
#> num_results = Inf): Invalid expression [1207]
#> {xml_nodeset (0)}

如何使用xml_find_all()或xml2中的另一个函数到达"foo id='b'"节点？

在详细标签名称上使用xml_find_all（）

0 个答案: