我正在尝试解析这个XML表,但是我在计算“var”节点的数量时遇到了麻烦。到目前为止,我的代码如下。我希望能够用一个可推广的值替换16597,以便我可以将此代码用于其他类似的表。我需要在R中执行此操作,而不是在XPATH中执行此操作。
require(RCurl)
require(XML)
url = "http://api.census.gov/data/2000/sf3/variables.xml"
doc = xmlParse(url)
root = xmlRoot(doc)
xml.data = xmlToList(doc)
id = NULL
label = NULL
concept = NULL
for(i in 1:16597){
id[i] = xml.data[[1]][[(i+2)]][["id"]]
label[i] = xml.data[[1]][[(i+2)]][["label"]]
concept[i] = xml.data[[1]][[(i+2)]][["concept"]]
}
scraped.data = data.frame(id, label, concept)
我在this question的基础上尝试了这个,但得到了0.
doc <- xmlTreeParse(url)
xpathApply(xmlRoot(doc),path="count(//vars)",xmlValue)
我的误解在哪里?
答案 0 :(得分:1)
你可以避免循环,只是&#34; rbind&#34;你的清单。
y <- ldply(xml.data[[1]], "rbind")
dim(y)
[1] 16599 6
head(y)
.id id label
1 var for Census API FIPS 'for' clause
2 var in Census API FIPS 'in' clause
3 var PCT022034 Total: Not living in an MSA/PMSA in 2000: Different house in 1995: In United States in 1995: In an MSA/PMSA in 1995:
4 var PCT022035 Total: Not living in an MSA/PMSA in 2000: Different house in 1995: In United States in 1995: In an MSA/PMSA in 1995: Central city
5 var PCT022032 Total: Not living in an MSA/PMSA in 2000: Different house in 1995:
6 var PCT022033 Total: Not living in an MSA/PMSA in 2000: Different house in 1995: In United States in 1995: