我需要解析以下HTML文档:
<span class="revision-gradient shadowed">90</span>
<span class="revision-gradient not_shadowed">75</span>
<span class="revision-gradient shadowed">85</span>
<span class="revision-gradient blurred">60</span>
要返回列表:
[90, 75, 85, 60]
我通常会使用此代码,但我不确定如何解析部分匹配:
document <- htmlParse(url)
myList <- unlist(lapply(document['//span[@class="revision-gradient"]'],xmlValue))
答案 0 :(得分:1)
您可以使用XML::xpathSApply
myList <- xpathSApply(document, "//span", xmlValue)
如果你有更多的跨度路径,以下将更加健壮
myList <- unlist(xpathSApply(document, "//span", function(x) {
if (grepl("revision-gradient", xmlGetAttr(x, "class"))) {
return(xmlValue(x))
}
NULL
}))
HTH
答案 1 :(得分:1)
library(rvest)
pg <- read_html('
<span class="revision-gradient shadowed">90</span>
<span class="revision-gradient not_shadowed">75</span>
<span class="revision-gradient shadowed">85</span>
<span class="revision-gradient blurred">60</span>
')
html_nodes(pg, "span.revision-gradient") %>%
html_text()
## [1] "90" "75" "85" "60"
或
html_nodes(pg, xpath=".//descendant-or-self::span[@class and
contains(concat(' ', normalize-space(@class), ' '),
' revision-gradient ')]") %>%
html_text()
如果你被困在XML
- 土地:
library(XML)
doc <- htmlParse('
<span class="revision-gradient shadowed">90</span>
<span class="revision-gradient not_shadowed">75</span>
<span class="revision-gradient shadowed">85</span>
<span class="revision-gradient blurred">60</span>
')
xpathSApply(doc, "//descendant-or-self::span[@class and
contains(concat(' ', normalize-space(@class), ' '),
' revision-gradient ')]", xmlValue)
如果你想要数字值,只需在向量上调用as.numeric()
。