使用rvest运行插入URL的字符值列表

时间:2018-02-28 16:44:06

标签: r rvest

好的,我正在使用这样的脚本从网站上检索一些数据:

library(tidyverse)
library(rvest)
library(magrittr)

patriot_url <- "https://www.sports-reference.com/cbb/schools/%s"
team_list <- c("american", "bucknell", "colgate", "navy", "lehigh", "army", "lafayette", "boston-university", "loyola-md", "holy-cross")

patriot <- 
  for(team in team_list) {
patriot_url <- sprintf(patriot_url, team)
patriot_values <- read_html(patriot_url) %>%
  html_nodes("td[data-stat]") %>% 
  html_text() %>%
  str_trim %>%
  matrix(ncol = 17, byrow = T) %>% 
  as.data.frame
  Sys.sleep(1) }

因此,它会抓取该网址,然后附加team_list的名称并将其提供给read_html,然后在其中检索某些数据。

如何让它循环,以便它完成所有十个字符以及来自URL的后续数据,以便最终输出/数据框是所有十个结果的组合?

1 个答案:

答案 0 :(得分:0)

这可能就是你要找的东西。 data.frame需要一些清理。

library(tidyverse)
library(rvest)
library(magrittr)

patriot_url <- "https://www.sports-reference.com/cbb/schools/%s"
team_list <- c("american", "bucknell", "colgate", "navy", "lehigh", "army", "lafayette", "boston-university", "loyola-md", "holy-cross")

patriot <- data.frame() 
for(team in team_list) {
    patriot_url <- sprintf(patriot_url, team)
    patriot_values <- html_table(read_html(patriot_url))[[1]] 
    patriot <- rbind(patriot, patriot_values)
    Sys.sleep(1)
}