我正在尝试解析来自baseball-reference.com的表格中的数据。我想为多个团队和多年来这样做。以下代码用于捕获每个团队赛季的链接。
library(XML)
#Will use for loop to fill in the rest of the link
link_base <- "http://www.baseball-reference.com/teams/"
#List of teams
teams <- c("CHC", "STL")
#Year
season <- 2000:2002
#End of link
end_link <- "-schedule-scores.shtml"
links <- list()
for(i in 1:length(teams)){
links[[i]] <- NaN*seq(length(teams))
for(j in 1:length(season)){
links[[i]][j] <- paste0(link_base, teams[i], "/", season[j], end_link)
}
}
这导致:
> links
[[1]]
[1] "http://www.baseball-reference.com/teams/CHC/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/CHC/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/CHC/2002-schedule-scores.shtml"
[[2]]
[1] "http://www.baseball-reference.com/teams/STL/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/STL/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/STL/2002-schedule-scores.shtml"
现在,对于列表中的每个元素,我想使用readHTMLTable函数,以便我可以解析信息。我试过这样做:
a <- list()
for(i in 1:length(teams)){
a[[i]] <- NaN*seq(length(teams))
for(j in 1:length(season)){
a[[i]][j] <- readHTMLTable(links[[i]][j])
}
}
readHTMLTable返回长度为6的列表:
x <- readHTMLTable(links[[1]][1])
> length(x)
[1] 6
我希望list a的第一个元素存储到readHTMLTable函数的输出中,用于&#34; CHC&#34;链接。我想列表a的第二个元素来存储来自readHTMLTable函数的输出,用于&#34; STL&#34;链接。因此,列表a将包括2个元素。这两个元素将包含3个由6个元素组成的列表。
答案 0 :(得分:0)
我认为这有效
lst <- lapply(links, function(l) lapply(l, function(x) readHTMLTable(x)))
length(lst)
# [1] 2
lengths(lst)
# [1] 3 3
第一个子列表应该有CHC,第二个子列表应该是STL。