单击按钮RSelenium Amazon Page Turn

时间:2018-09-09 05:39:56

标签: r web-scraping rselenium

我无法让Rselenium在我尝试抓取的Amazon Review部分上打开页面。下面是我的代码。我尝试了CSS和xpath的几乎所有组合。有什么想法吗?

       replicate(100,
          {
remDr$navigate("https://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviewshttps://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews")
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
morereviews <- remDr$findElement(using = 'css selector', ".a-last a")
morereviews$clickElement()
Sys.sleep(4)

reviews <- xml2::read_html(remDr$getPageSource()[[1]])%>%
  rvest::html_nodes(".review-text")%>%
  dplyr::data_frame(reviews = .)
})

1 个答案:

答案 0 :(得分:0)

在这种情况下,您不需要使用RSelenium,而只需使用rvest。首先,您可以抓取这些页面之一的评论,直接阅读html。其次,请注意,每次您在“评论”部分打开页面时,URL也会更改(实际上,它表示您正在查看的页码)。因此,您可以使用循环来更改网址并刮取所有评论:

reviews <- lapply(1:100,
       function(i){
         url <- paste0("https://www.amazon.com/Eagles-Nest-Outfitters-DoubleNest-Portable/product-reviews/B00K30GXK8/ref=cm_cr_getr_d_paging_btm_next_",i,"?ie=UTF8&reviewerType=all_reviewshttps%3A%2F%2Fwww.amazon.com%2FEagles-Nest-Outfitters-DoubleNest-Portable%2Fproduct-reviews%2FB00K30GXK8%2Fref%3Dcm_cr_dp_d_show_all_btm%3Fie%3DUTF8&reviewerType=all_reviews&pageNumber=",i)
         xml2::read_html(url) %>%
           rvest::html_nodes(".review-text") %>%
           rvest::html_text() %>%
           dplyr::data_frame(reviews = .)
       })
(reviews <- do.call("rbind", reviews))