我正在使用其他人建立的网络抓取脚本。
library(rvest) # For web scraping
library(stringr) # For string processing
base_url <- "http://www.247sports.com/Season/%i-Football/CompositeTeamRankings"
year_list <- seq(from = 2005, to = 2016, by = 1)
conf_list <- c("ACC", "Big-12", "AAC", "Big-Ten", "C-USA", "IND", "MAC", "M-West", "Pac-12", "SEC", "SBC")
# initialize the matrix to append teams to
recruit_matrix <- matrix("", nrow = 1, ncol = 6)
for(year in year_list){
year_url <- sprintf(base_url, year)
year_url <- str_c(year_url, "?Conference=%s")
for(conf in conf_list){
conf_url <- sprintf(year_url, conf)
conf_values <- read_html(conf_url) %>%
html_nodes(".team_itm span , .playerinfo_blk a") %>% # from the Inspector Gadget tool
html_text %>%
str_trim %>%
matrix(ncol = 4, byrow = T) %>%
cbind(conf, year)
recruit_matrix <- rbind(recruit_matrix, conf_values)
Sys.sleep(1) # wait a second to not throttle the servers at 247
}
}
Here is a link to the complete script
但是当我遇到这一点时,我遇到了一些像这样的错误:
1: In cbind(., conf, year) :
number of rows of result is not a multiple of vector length (arg 2)
该错误持续数百行。
如果我在没有数据抓取的情况下创建一个非常简单的表,我可以更好地解决这个问题,但是当涉及数据抓取元素时,我很遗憾。
有没有人有解决方案?