计算R中的XML节点

时间:2015-01-20 16:14:00

标签: xml r xpath

我正在尝试解析这个XML表,但是我在计算“var”节点的数量时遇到了麻烦。到目前为止,我的代码如下。我希望能够用一个可推广的值替换16597,以便我可以将此代码用于其他类似的表。我需要在R中执行此操作,而不是在XPATH中执行此操作。

require(RCurl)
require(XML)
url = "http://api.census.gov/data/2000/sf3/variables.xml"
doc = xmlParse(url)
root = xmlRoot(doc)
xml.data = xmlToList(doc)

id = NULL
label = NULL
concept = NULL
for(i in 1:16597){
  id[i] = xml.data[[1]][[(i+2)]][["id"]]
  label[i] = xml.data[[1]][[(i+2)]][["label"]]
  concept[i] = xml.data[[1]][[(i+2)]][["concept"]]
}

scraped.data = data.frame(id, label, concept)

我在this question的基础上尝试了这个,但得到了0.

doc <- xmlTreeParse(url)
xpathApply(xmlRoot(doc),path="count(//vars)",xmlValue)

我的误解在哪里?

1 个答案:

答案 0 :(得分:1)

你可以避免循环,只是&#34; rbind&#34;你的清单。

y <- ldply(xml.data[[1]], "rbind")
dim(y)
[1] 16599     6
head(y)
  .id        id                                                                                                                                  label
1 var       for                                                                                                           Census API FIPS 'for' clause
2 var        in                                                                                                            Census API FIPS 'in' clause
3 var PCT022034               Total:  Not living in an MSA/PMSA in 2000:  Different house in 1995:  In United States in 1995:  In an MSA/PMSA in 1995:
4 var PCT022035 Total:  Not living in an MSA/PMSA in 2000:  Different house in 1995:  In United States in 1995:  In an MSA/PMSA in 1995:  Central city
5 var PCT022032                                                                   Total:  Not living in an MSA/PMSA in 2000:  Different house in 1995:
6 var PCT022033                                        Total:  Not living in an MSA/PMSA in 2000:  Different house in 1995:  In United States in 1995: