Question

我正在使用rvest来抓一个网站。它有效，购买效率极低，我无法弄清楚如何让它更好地运作。

url中的

是一个超过10.000个网址的列表。

number <- sapply(url, function(x)
  read_html(x) %>%
  html_nodes(".js-product-artnr") %>%
  html_text())

price_new <- sapply(url, function(x)
  read_html(x) %>%
  html_nodes(".product-page__price__new") %>%
  html_text())

price_old <- sapply(url, function(x)
  read_html(x) %>%
  html_nodes(".product-page__price__old") %>%
  html_text())

上面的问题是，rvest访问10.000 url以获取＆＃34; .js-product-artnr＆＃34;中的第一个节点，然后再次访问第二个节点的相同10.000 url，依此类推。最后，我希望从这些10.000页中需要大约10个不同的节点。将它们逐一加入并在以后合并到数据框中需要很长时间，必须有更好的方法。

我正在寻找类似下面的内容，以便在1次搜索中获取所有信息

info <- sapply(url, function(x)
  read_html(x) %>%
  html_nodes(".js-product-artnr") %>%
  html_nodes(".product-page__price__new") %>%
  html_nodes(".product-page__price__old") %>%
  html_text())

Answer 1

这对我有用。

  func <- function(url){
  sample <- read_html(url) %>%
  scrape1 <- html_nodes(sample, ".js-product-artnr")%>%
  html_text()
  scrape2 <- html_nodes(sample, ".product-page__price__new") %>%
  html_text()
  scrape3 <- html_nodes(sample,".product-page__price__old") %>%
  html_text()
  df <- cbind(scrape1, scrape2, scrape3)
  final_df <- as.data.frame(df)
return(final_df)
}

数据<-lapply（urls_all，func）

R使用rvest（大量网址列表）在1次搜索中返回多个节点

1 个答案: