Question

将一个复杂的网站解析为HTML：

library("XML")
doc<-htmlParse("Webpage.html")
xpath<-"//par" #relative path

例如，我可以找到与相对路径匹配的所有节点：

data<-xpathSApply(doc,xpath)

但是如何找到这些节点的绝对路径？

Answer 1

您可以使用xmlAncestors选项fun=xmlName来获取完整路径。

doc <- htmlParse("http://stackoverflow.com/questions/42031842")
summary(doc)
xpathSApply(doc, "//h3", xmlValue)

xpathSApply(doc, "//h3", function(y) paste(unlist( xmlAncestors(y, fun=xmlName)), collapse="/")) 
[1] "html/body/div/div/div/div/div/h3"                     
[2] "html/body/div/div/div/div/div/h3"                     
[3] "html/body/div/div/div/div/div/h3"                     
[4] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[5] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[6] "html/body/div/div/div/div/div/form/div/noscript/h3"   

xpathSApply(doc, "/html/body/div/div/div/div/div/form/div/noscript/h3", xmlValue)
[1] "Post as a guest"

如何获得在R中使用相对xpath找到的节点的绝对路径？

1 个答案: