我有超过10,000个网址可用于从中抓取数据。我正在尝试使用parLapply抓取数据。我收到以下错误。
我看到所有网址都可以正常运行,并且不了解发生此错误的主要原因。
site <- as.character(sites[21150:31723,])
> results <- parLapply(cluster, site, function(i) {
+ library(dplyr)
+ library(xml2)
+ library(magrittr)
+ library(rvest)
+ review <- read_html(i)
+ threads<- cbind(review %>% html_nodes("blockquote.postcontent.restore") %>% html_text())
+ datethreads <- cbind(review %>% html_nodes("span.date") %>% html_text())
+ userinfo <- cbind(review %>% html_nodes("div.username_container") %>% html_text())
+ #user<- gsub("View.*", "", userinfo)
+ title <- cbind(review %>% html_nodes("li.navbit.lastnavbit") %>% html_text())
+ urls <- cbind(review %>% html_nodes("span.threadtitle") %>% html_nodes("a") %>% html_attr("href") %>% paste0("https://forums.vwvortex.com/", .) )
+ links <- sub("&.*","", urls)
+ library(rowr)
+ x <- data.frame(rowr::cbind.fill(threads, datethreads, userinfo, title, links, fill = NA), stringsAsFactors = FALSE)
+ return(x)
+ })
Error in checkForRemoteErrors(val) :
one node produced an error: Maximum (10) redirects followed
任何建议或帮助都会真正有帮助。谢谢。