我想使用包rvest
从网页中提取汽油价格。但是,我无法提取数值,必须通过html类.sp_p
。
library(rvest)
desmoines <- html("http://www.desmoinesgasprices.com/")
拉动汽油价格:
price <- desmoines %>%
html_nodes(".sp_p")
head(price, 3)
输出:
[[1]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p5"></div>
</div>
[[2]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p6"></div>
</div>
[[3]]
<div class="sp_p">
<div class="p2"></div>
<div class="pd"></div>
<div class="p5"></div>
<div class="p7"></div>
</div>
attr(,"class")
[1] "XMLNodeSet"
现在,我想使用包stringr
从web scrape中提取数字,但我不能使用stringr
,因为price
不是原子向量。我该如何解决这个问题?
答案 0 :(得分:3)
这是一种可能性:
library(stringr)
pr <- xml_children(price)
p_raw <- sapply(1:length(pr), function(x) paste(xml_attrs(pr[[x]]),collapse=""))
p_readable <- paste0("$",str_replace_all(p_raw,c("d"=".","p"="")))
#> p_readable
# [1] "$2.49" "$2.57" "$2.59" "$2.59" "$2.59" "$2.59" "$2.59" "$2.59" "$2.61" "$2.64" "$2.67" "$2.68" "$2.68"
#[14] "$2.68" "$3.08" "$2.99" "$2.98" "$2.98" "$2.98" "$2.98" "$2.98" "$2.98" "$2.98" "$2.98" "$2.98" "$2.98"
#[27] "$2.98" "$2.98" "$2.98"