将一个复杂的网站解析为HTML:
library("XML")
doc<-htmlParse("Webpage.html")
xpath<-"//par" #relative path
例如,我可以找到与相对路径匹配的所有节点:
data<-xpathSApply(doc,xpath)
但是如何找到这些节点的绝对路径?
答案 0 :(得分:0)
您可以使用xmlAncestors
选项fun=xmlName
来获取完整路径。
doc <- htmlParse("http://stackoverflow.com/questions/42031842")
summary(doc)
xpathSApply(doc, "//h3", xmlValue)
xpathSApply(doc, "//h3", function(y) paste(unlist( xmlAncestors(y, fun=xmlName)), collapse="/"))
[1] "html/body/div/div/div/div/div/h3"
[2] "html/body/div/div/div/div/div/h3"
[3] "html/body/div/div/div/div/div/h3"
[4] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[5] "html/body/div/div/div/div/div/form/div/div/div/div/h3"
[6] "html/body/div/div/div/div/div/form/div/noscript/h3"
xpathSApply(doc, "/html/body/div/div/div/div/div/form/div/noscript/h3", xmlValue)
[1] "Post as a guest"