Question

我用R解析了一个XML文档，例如：

library(XML)
f = system.file("exampleData", "mtcars.xml", package="XML")
doc = xmlParse(f)

使用XPath表达式，我可以选择文档中的特定节点：

> getNodeSet(doc, "//record[@id='Mazda RX4']/text()")
[[1]]
   21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 

    attr(,"class")
    [1] "XMLNodeSet"

但我无法弄清楚如何将结果转换为R字符向量：

> as.character(getNodeSet(doc, "//record[@id='Mazda RX4']/text()"))
[1] "<pointer: 0x000000000e6a7fe0>"

如何从内部指针获取文本到C对象？

Answer 1

使用xmlValue。以下是您的示例的扩展，以帮助您了解类的内容：

v <- getNodeSet(doc, "//record[@id='Mazda RX4']/text()")
str(v)
#List of 1
#$ :Classes 'XMLInternalTextNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
#- attr(*, "class")= chr "XMLNodeSet"
v2 <- sapply(v, xmlValue)  #this is the code chunk of interest to you
v2
#[1] "   21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4"
str(v2)
#chr "   21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4"

Answer 2

以下内容也适用：您可以使用xpathApply并添加xmlValue作为参数，而不是getNodeSet（）和sapply（v，xmlValue）

doc = xmlParse(f)
xpathApply(doc,"//record[@id='Mazda RX4']/text()")

[[1]]
   21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 

attr(,"class")
[1] "XMLNodeSet"

xpathApply(doc,"//record[@id='Mazda RX4']/text()",xmlValue)

[[1]]
[1] "   21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4"

这是列表中的字符对象。您可以通过取消列出，将字符串与一个或多个空格的正则表达式分开，再次取消列出和as.numeric（）

，将其转换为数字对象的向量

 as.numeric(unlist(strsplit(unlist(v)," +")))
 [1]     NA  21.00   6.00 160.00 110.00   3.90   2.62  16.46   0.00   1.00   4.00   4.00

解析XML文件并返回R字符向量

2 个答案: