好的,我正在使用这样的脚本从网站上检索一些数据:
library(tidyverse)
library(rvest)
library(magrittr)
patriot_url <- "https://www.sports-reference.com/cbb/schools/%s"
team_list <- c("american", "bucknell", "colgate", "navy", "lehigh", "army", "lafayette", "boston-university", "loyola-md", "holy-cross")
patriot <-
for(team in team_list) {
patriot_url <- sprintf(patriot_url, team)
patriot_values <- read_html(patriot_url) %>%
html_nodes("td[data-stat]") %>%
html_text() %>%
str_trim %>%
matrix(ncol = 17, byrow = T) %>%
as.data.frame
Sys.sleep(1) }
因此,它会抓取该网址,然后附加team_list
的名称并将其提供给read_html
,然后在其中检索某些数据。
如何让它循环,以便它完成所有十个字符以及来自URL的后续数据,以便最终输出/数据框是所有十个结果的组合?
答案 0 :(得分:0)
这可能就是你要找的东西。 data.frame需要一些清理。
library(tidyverse)
library(rvest)
library(magrittr)
patriot_url <- "https://www.sports-reference.com/cbb/schools/%s"
team_list <- c("american", "bucknell", "colgate", "navy", "lehigh", "army", "lafayette", "boston-university", "loyola-md", "holy-cross")
patriot <- data.frame()
for(team in team_list) {
patriot_url <- sprintf(patriot_url, team)
patriot_values <- html_table(read_html(patriot_url))[[1]]
patriot <- rbind(patriot, patriot_values)
Sys.sleep(1)
}