我正在尝试从网页上搜索评论以确定单词频率。但是,审核时间较长时,只会进行部分审核。您必须单击“更多”才能使网页显示完整评论。这是我用来提取评论文本的代码。如何“点击”更多以获得完整评论?
library(rvest)
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-
Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
webpage <-read_html(tripAdvisorURL)
reviewData <- xml_nodes(webpage,xpath = '//*[contains(concat( " ", @class, "
" ), concat( " ", "partial_entry", " " ))]')
head(reviewData)
xml_text(reviewData[[1]])
[1] "The rooms were clean and we slept so good we had room 10 and 12 we
didn’t use 12 but it joins 10 .kind of strange but loved the hotel ..me
personally I would take the hot tub out it was kinda old..the lady
that...More"
答案 0 :(得分:1)
如评论中所述,您可以将Rselenium与rvest一起使用以获得更多交互性:
library(RSelenium)
rmDr <- rsDriver(browser = "chrome")
myclient <- rmDr$client
tripAdvisorURL <- "https://www.tripadvisor.com/Hotel_Review-g33657-d85704-Reviews-Hotel_Bristol-Steamboat_Springs_Colorado.html#REVIEWS"
myclient$navigate(tripAdvisorURL)
#select all "more" button, and loop to click them
webEles <- myclient$findElements(using = "css",value = ".ulBlueLinks")
for (webEle in webEles) {
webEle$clickElement()
}
mypagesource <- myclient$getPageSource()
read_html(mypagesource[[1]]) %>%
html_nodes(".partial_entry") %>%
html_text()