在R(XML包)中提取xml节点的第二个属性

时间:2014-05-09 15:05:55

标签: xml r xml-parsing

我想从.xml文件中提取'lat'和'long',如下所示:

<asdf>
<dataset>
    <px lon="-55.75" lat="-18.5">2.186213</px>
    <px lon="-50.0"  lat="-18.5">0.0</px>
    <px lon="-66.75" lat="-03.0">1.68412</px>
    </dataset>
</asdf>

这是我到目前为止所做的,使用R :: XML包:

#Load library for xml loading reading extracting
library(XML)

#Parse xml file 
a3  <- xmlRoot(xmlTreeParse("my_file.xml"))

#Extract text-value and attributes as lists
precip <- xmlSApply(a3, function(x) xmlSApply(x, xmlValue))
long   <- xmlSApply(a3, function(x) xmlSApply(x, xmlAttrs))
lat    <- xmlSApply(a3, function(x) xmlSApply(x, xmlAttrs)) #???

dt.lat.long.val <- data.frame(as.numeric(as.vector(lat)), 
                          as.numeric(as.vector(long)), 
                          as.numeric(as.vector(precip)))

如何编辑以#??? 结尾的行,以获取lat值?

1 个答案:

答案 0 :(得分:4)

您可以使用这些行中的内容提取数据

test <- '<asdf>
<dataset>
    <px lon="-55.75" lat="-18.5">2.186213</px>
    <px lon="-50.0"  lat="-18.5">0.0</px>
    <px lon="-66.75" lat="-03.0">1.68412</px>
    </dataset>
</asdf>'

library(XML)
a3 <- xmlParse(test)

out <- xpathApply(a3, "//px", function(x){
  coords <- xmlAttrs(x)
  data.frame(precip = xmlValue(x), lon = coords[1], lat = coords[2], stringsAsFactors = FALSE)
})

> do.call(rbind.data.frame, out)
       precip    lon   lat
lon  2.186213 -55.75 -18.5
lon1      0.0  -50.0 -18.5
lon2  1.68412 -66.75 -03.0