通过多个网站抓取

时间:2020-04-14 09:18:12

标签: r web-scraping

我试图获取多个表,然后在经过一些操作后将其转换为R中的一个数据帧。

请参见下面的代码

countries <- c("au","at","de","se","gb","us")

for (i in countries) {
sides<-glue("https://www.beeradvocate.com/beer/top-rated/",i,.sep = "") 
html[i] <- read_html(sides)
cont[i] <- html[i] %>% 
  html_nodes("table") %>% html_table()
}

如果这样做,我会收到以下错误消息:

 *number of items to replace is not a multiple of replacement lengthError in 
 UseMethod("xml_find_all") : 
   no applicable method for 'xml_find_all' applied to an object of class 
 "list"*

有人可以帮我吗?

非常感谢!

1 个答案:

答案 0 :(得分:0)

require(tidyverse)
require(rvest)

path_base <- "https://www.beeradvocate.com/beer/top-rated/"
countries <- c("au","at","de","se","gb","us")

path <- paste0(path_base, countries)

html_files <- path %>% 
  map(read_html)

html_files %>% 
  map(html_node, css = "table") %>% 
  map(html_table, header = TRUE, fill = TRUE)