以this question为基础,我希望从smallText
节点中提取单个节点(“喜欢”),但忽略其他节点。我要查找的节点是a.SmallText,因此只需选择一个即可。
代码:
url <- "https://www.goodreads.com/quotes/search?page=1&q=simone+de+beauvoir&utf8=%E2%9C%93"
quote_rating <- function(html){
path <- read_html(html)
path %>%
html_nodes(xpath = paste(selectr::css_to_xpath(".smallText"), "/text()"))%>%
html_text(trim = TRUE) %>%
str_trim(side = "both") %>%
enframe(name = NULL)
}
quote_rating(url)
给出的结果:
# A tibble: 80 x 1
value
<chr>
1 Showing 1-20 of 790
2 (0.03 seconds)
3 tags:
4 ""
5 2492 likes
6 2265 likes
7 tags:
8 ,
9 ,
10 ,
# ... with 70 more rows
添加过多的html_nodes("a.smallText")
过滤器:
quote_rating <- function(html){
path <- read_html(html)
path %>%
html_nodes(xpath = paste(selectr::css_to_xpath(".smallText"), "/text()")) %>%
html_nodes("a.smallText") %>%
html_text(trim = TRUE) %>%
str_trim(side = "both") %>%
enframe(name = NULL)
}
# A tibble: 0 x 1
# ... with 1 variable: value <chr>
>
答案 0 :(得分:1)
这对我有用...
library(rvest)
url <- "https://www.goodreads.com/quotes/search?page=1&q=simone+de+beauvoir&utf8=%E2%9C%93"
page <- read_html(url)
page %>% html_nodes("div.quote.mediumText") %>% #select quote boxes
html_node("a.smallText") %>% #then the smallText in each one
html_text()
[1] "2492 likes" "2265 likes" "2168 likes"
[4] "2003 likes" "1774 likes" "1060 likes"
[7] "580 likes" "523 likes" "482 likes"
[10] "403 likes" "383 likes" "372 likes"
[13] "360 likes" "347 likes" "330 likes"
[16] "329 likes" "318 likes" "317 likes"
[19] "310 likes" "281 likes"
请注意html_node
和html_nodes
之间的区别。首先选择引号框的好处是,您可以根据需要提取其他信息,然后轻松将其与“喜欢”数进行匹配。
答案 1 :(得分:1)
提取每个报价的点赞次数。一个人可以只使用css选择器来执行过滤,一个人想要使用a
查找class=smallText
标签。
此简单的代码片段有效:
library(rvest)
url <- "https://www.goodreads.com/quotes/search?page=1&q=simone+de+beauvoir&utf8=%E2%9C%93"
path <- read_html(url)
path %>%
html_nodes("a.smallText") %>%
html_text(trim = TRUE)
# [1] "2492 likes" "2265 likes" "2168 likes" "2003 likes" "1774 likes" "1060 likes" "580 likes"
# [8] "523 likes" "482 likes" "403 likes" "383 likes" "372 likes" "360 likes" "347 likes"
# [15] "330 likes" "329 likes" "318 likes" "317 likes" "310 likes" "281 likes"