Question

`webpage <- "https://www.naatp.org/resources/addiction-industry-directory"
for(i in 2:22) {

   data <- read_html(webpage) %>%
    html_nodes("table") %>%
    .[[1]] %>% 
    html_table()
  webpage <- html_session(webpage) %>% follow_link(css = ".pager-next a") %>% .[["url"]]
  data2<-rbind(data2,data )
}`

我为该网站上的数据编写了用于数据抓取的代码。也就是说，该网站有22页，我要抓取包含在诸如联系信息之类的页面中的数据，例如https://www.naatp.org/resources/addiction-industry-directory/3832/1-method-center 那么有人可以帮助我解决这个问题吗？

Answer 1

这听起来像是一个duplicate问题，但是您很亲近，所以这里...

library(tidyverse)
library(rvest)

pages <- 0:21
urls <- paste0("https://www.naatp.org/resources/addiction-industry-directory?page=", pages)

get_table <- function(url) {
  url %>%
    read_html() %>%
    html_table()
}

results <- sapply(urls, get_table)

bind_rows(results) %>% 
  as_data_frame()

从页面抓取数据

1 个答案: