我是Rselenium的新手。我正在尝试从下面的网站上抓取所有评论。
https://www.google.com/search?client=firefox-b-1-ab&ei=fthAXLWfC8qp_QavqbGIDQ&q=fox+volkswagen+rochester+hills&oq=&gs_l=psy-ab.3.5.35i39l6.24734.25762..29755...1.0..0.114.114.0j1......0....1..gws-wiz.....6..0i71j0j0i131.7sWXKnj597Y#lrd=0x8824e9cf8f68257b:0xc45f1982878cfc94,1,,,
我在这里的问题是,我总共只能刮掉370条评论中的10条。我认为我的向下滚动代码无法正确地刮掉所有评论。我尝试了几种方法使其正常工作,但这些方法均无效。
#simulate scroll down for several times
count=read_html(pagesource) %>%
html_nodes(".p13zmc") %>%
html_text()
#Stores the number of reviews for the url to know how many times to scroll down
#This part of the code does not work fine
scroll_down_times=count %>%
str_sub(1,nchar(count)-5) %>%
as.numeric()
for(i in 1 :scroll_down_times){
webEle$sendKeysToActiveElement(sendKeys = list(key="page_down"))
#the content needs time to load,wait 1.2 second every 5 scroll downs
if(i%%5==0){
Sys.sleep(1.2)
此外,我尝试实现executeScript
,但仍然无法正常工作
# Keep scrolling down page, loading new content each time.
last_height = 0 #
repeat {
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(3) #delay by 3sec to give chance to load.
# Updated if statement which breaks if we can't scroll further
new_height = remDr$executeScript("return document.body.scrollHeight")
if(unlist(last_height) == unlist(new_height)) {
break
} else {
last_height = new_height
}
}
我不知道该怎么做。任何建议和帮助将不胜感激。预先感谢。