我需要遍历一些包含两个变量的数据:season和school。如果我把学校变量固定(下面)我可以让它循环我指定的季节:
library(XML)
# parameters
first_season <- 2014
last_season <- 2015
# seasons
num_seasons <- as.numeric(last_season - first_season + 1)
seasons <- seq(first_season, last_season, by=1)
# defense
defense <- data.frame()
for (i in 1:num_seasons) {
url <- paste("http://www.sports-reference.com/cfb/schools/wisconsin/", seasons[i], ".html", sep = "")
df <- readHTMLTable(url,which=4, header=FALSE, stringsAsFactors=F)
df$season = seasons[i]
defense <- rbind(defense, df)
rm(df)
print(seasons[i])
}
我的问题是我不知道如何添加a)一个额外的参数来循环,以及b)如果参数是非数字的,如何处理它。
我的学校列表位于表格/列colleges$school
^
> head(colleges$school)
[1] "Air Force" "Akron"
[3] "Alabama" "Alabama-Birmingham"
[5] "Alameda Coast Guard" "Alcorn State"
^网址将始终为lower(colleges$school)
,-
替换,但我可以控制它。
提前致谢!
答案 0 :(得分:0)
不确定我理解(b)。你的意思是传递参数(例如school [j]),或存储数据(例如seasons [i])。
我所做的就是添加一个外循环并迭代大学。我将结果存储在一个名为school_defense的新df中。我没有你的学校名单所以我无法测试它。
library(XML)
# parameters
first_season <- 2014
last_season <- 2015
# seasons
num_seasons <- as.numeric(last_season - first_season + 1)
seasons <- seq(first_season, last_season, by=1)
# schools
schools <- unique(lower(colleges$school))
# defense
school_defense <- data.frame()
for(j in 1:length(schools)){
defense <- data.frame()
for (i in 1:num_seasons) {
url <- paste("http://www.sports-reference.com/cfb/schools/", school[j],"/", seasons[i], ".html", sep = "")
df <- readHTMLTable(url,which=4, header=FALSE, stringsAsFactors=F)
df$season = seasons[i]
defense <- rbind(defense, df)
rm(df)
print(seasons[i])
}
defense <- data.frame(school = rep(school[j], nrow(defense)), defense)
school_defense <- data.frame(school_defense, defense)
}