我使用以下代码抓取了多个网址:
x <- NULL
for (i in 1:5){
k1<-"https://forums.vwvortex.com/forumdisplay.php?5449-Atlas-SUV/page"
k2<-"&pp=200"
url<-paste(k1,i,k2,sep="")
review <- read_html(url)
threads<- cbind(review %>% html_nodes("h3.threadtitle") %>% html_nodes("a") %>% html_attr("href") )
x<- rbind(x, threads)}
x[] <- Map(paste, 'https://forums.vwvortex.com/', x, sep="")
url<- paste(x)
url <- sub("\\&s=bd72f867af71d9d03d74dc394a45b624","/page", url)
现在,我拥有所有必需的网址。我还可以使用以下代码来抓取与我之前抓取的每个网址相关联的所有回复:
results <- lapply(url, function(i) {
review <- read_html(i)
threads<- cbind(review %>% html_nodes("blockquote.postcontent.restore") %>% html_text())
replies <- as.data.frame(threads)
return(replies)
})
问题是我只能刮取每个URL的首页。有没有一种方法可以为我抓取的每个网址浏览1到100页?