在Web抓取脚本中组合不同长度的向量

时间:2018-02-08 22:49:06

标签: r web-scraping rstudio

我正在使用其他人建立的网络抓取脚本。

library(rvest) # For web scraping
library(stringr) # For string processing

base_url <- "http://www.247sports.com/Season/%i-Football/CompositeTeamRankings"

year_list <- seq(from = 2005, to = 2016, by = 1)

conf_list <- c("ACC", "Big-12", "AAC", "Big-Ten", "C-USA", "IND", "MAC", "M-West", "Pac-12", "SEC", "SBC")

# initialize the matrix to append teams to
recruit_matrix <- matrix("", nrow = 1, ncol = 6) 
for(year in year_list){
year_url <- sprintf(base_url, year)
year_url <- str_c(year_url, "?Conference=%s")
for(conf in conf_list){
conf_url <- sprintf(year_url, conf)
conf_values <- read_html(conf_url) %>% 
  html_nodes(".team_itm span , .playerinfo_blk a") %>% # from the Inspector Gadget tool
html_text %>%
str_trim %>%
matrix(ncol = 4, byrow = T) %>%
  cbind(conf, year)
recruit_matrix <- rbind(recruit_matrix, conf_values)
Sys.sleep(1) # wait a second to not throttle the servers at 247
}
   }

Here is a link to the complete script

但是当我遇到这一点时,我遇到了一些像这样的错误:

1: In cbind(., conf, year) :
number of rows of result is not a multiple of vector length (arg 2)

该错误持续数百行。

如果我在没有数据抓取的情况下创建一个非常简单的表,我可以更好地解决这个问题,但是当涉及数据抓取元素时,我很遗憾。

有没有人有解决方案?

0 个答案:

没有答案