如何获得在R中使用相对xpath找到的节点的绝对路径?

时间:2017-02-03 19:40:56

标签: r xml xpath

将一个复杂的网站解析为HTML:

library("XML")
doc<-htmlParse("Webpage.html")
xpath<-"//par" #relative path
例如,我可以找到与相对路径匹配的所有节点:

data<-xpathSApply(doc,xpath)

但是如何找到这些节点的绝对路径?

1 个答案:

答案 0 :(得分:0)

您可以使用xmlAncestors选项fun=xmlName来获取完整路径。

doc <- htmlParse("http://stackoverflow.com/questions/42031842")
summary(doc)
xpathSApply(doc, "//h3", xmlValue)

xpathSApply(doc, "//h3", function(y) paste(unlist( xmlAncestors(y, fun=xmlName)), collapse="/")) 
[1] "html/body/div/div/div/div/div/h3"                     
[2] "html/body/div/div/div/div/div/h3"                     
[3] "html/body/div/div/div/div/div/h3"                     
[4] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[5] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[6] "html/body/div/div/div/div/div/form/div/noscript/h3"   

xpathSApply(doc, "/html/body/div/div/div/div/div/form/div/noscript/h3", xmlValue)
[1] "Post as a guest"