无法向下滚动并刮掉Rselenium中的所有评论

时间:2019-03-18 14:01:20

标签: r web-scraping scroll rselenium

我是Rselenium的新手。我正在尝试从下面的网站上抓取所有评论。

https://www.google.com/search?client=firefox-b-1-ab&ei=fthAXLWfC8qp_QavqbGIDQ&q=fox+volkswagen+rochester+hills&oq=&gs_l=psy-ab.3.5.35i39l6.24734.25762..29755...1.0..0.114.114.0j1......0....1..gws-wiz.....6..0i71j0j0i131.7sWXKnj597Y#lrd=0x8824e9cf8f68257b:0xc45f1982878cfc94,1,,,

我在这里的问题是,我总共只能刮掉370条评论中的10条。我认为我的向下滚动代码无法正确地刮掉所有评论。我尝试了几种方法使其正常工作,但这些方法均无效。

   #simulate scroll down for several times
    count=read_html(pagesource) %>%
      html_nodes(".p13zmc") %>%
      html_text()

    #Stores the number of reviews for the url to know how many times to scroll down
    #This part of the code does not work fine
    scroll_down_times=count %>%
      str_sub(1,nchar(count)-5) %>%
      as.numeric()

    for(i in 1 :scroll_down_times){
      webEle$sendKeysToActiveElement(sendKeys = list(key="page_down"))
      #the content needs time to load,wait 1.2 second every 5 scroll downs
      if(i%%5==0){
        Sys.sleep(1.2)

此外,我尝试实现executeScript,但仍然无法正常工作

# Keep scrolling down page, loading new content each time. 
last_height = 0 #
repeat {   
  remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
  Sys.sleep(3) #delay by 3sec to give chance to load. 

  # Updated if statement which breaks if we can't scroll further 
  new_height = remDr$executeScript("return document.body.scrollHeight")
  if(unlist(last_height) == unlist(new_height)) {
    break
  } else {
    last_height = new_height
  }
}

我不知道该怎么做。任何建议和帮助将不胜感激。预先感谢。

0 个答案:

没有答案