R从TripAdvisor的多个页面抓取评论

时间:2019-11-15 23:33:09

标签: r rvest rselenium

我正在尝试从TripAdvisor摘录几页关于一个学术项目的评论。

这是我尝试使用R

#Load libraries
library(rvest)
library(RSelenium)

# main url for stadium
urlmainlist=c(
  hampdenpark="http://www.tripadvisor.com.ph/Attraction_Review-g186534-d214132-Reviews-Hampden_Park-Glasgow_Scotland.html"
)

# Specify how many search pages and counter
morepglist=list(
  hampdenpark=seq(10,360,10)
)
#----------------------------------------------------------------------------------------------------------

# create pickstadium variable
pickstadium="hampdenpark"


# get list of urllinks corresponding to different pages

# url link for first search page
urllinkmain=urlmainlist[pickstadium]
# counter for additional pages
morepg=as.numeric(morepglist[[pickstadium]])

urllinkpre=paste(strsplit(urllinkmain,"Reviews-")[[1]][1],"Reviews",sep="")
urllinkpost=strsplit(urllinkmain,"Reviews-")[[1]][2]

urllink=rep(NA,length(morepg)+1)

urllink[1]=urllinkmain
for(i in 1:length(morepg)){
  urllink[i+1]=paste(urllinkpre,"-or",morepg[i],"-",urllinkpost,sep="")
}
head(urllink)
write.csv(urllink,'urllink.csv')

##########
#SCRAPING#
##########

library(rvest)
library(RSelenium)
#install.packages('RSelenium')

testurl <- read.csv("urllink.csv", header=FALSE, quote="'", stringsAsFactors = F)
testurl=testurl[-1,]
testurl=testurl[,-1]
testurl=as.data.frame(testurl)
testurl=gsub('"',"",testurl$testurl)
list<-unlist(testurl)

tripadvisor <- NULL

#Scrape
for(i in 1:length(list)){

  reviews <- list[i] %>% 
    read_html() %>% 
    html_nodes("#REVIEWS .innerBubble")

  id <- reviews %>%
    html_node(".quote a") %>%
    html_attr("id")

  rating <- reviews %>%
    html_node(".rating .rating_s_fill") %>%
    html_attr("alt") %>%
    gsub(" of 5 stars", "", .) %>%
    as.integer()

  date <- reviews %>%
    html_node(".rating .ratingDate") %>%
    html_attr("title") %>%
    strptime("%b %d, %Y") %>%
    as.POSIXct()

  review <- reviews %>%
    html_node(".entry .partial_entry") %>%
    html_text()%>%
    as.character()

  rowthing <- data.frame(id, review,rating, date, stringsAsFactors = FALSE)
  tripadvisor<-rbind(rowthing, tripadvisor)
}

但是,这将导致空tripadvisor数据帧。解决此问题的任何帮助将不胜感激。

其他问题

我想捕获全部评论,因为我的代码当前仅打算捕获部分条目。对于每个评论,我想自动单击“ More”链接,然后提取完整的评论。

在这里,我们将不胜感激任何帮助。

0 个答案:

没有答案