Question

我正在尝试解析来自baseball-reference.com的表格中的数据。我想为多个团队和多年来这样做。以下代码用于捕获每个团队赛季的链接。

library(XML)

#Will use for loop to fill in the rest of the link
link_base <- "http://www.baseball-reference.com/teams/"
#List of teams
teams <- c("CHC", "STL")
#Year
season <- 2000:2002
#End of link
end_link <- "-schedule-scores.shtml"

links <- list()
for(i in 1:length(teams)){
  links[[i]] <- NaN*seq(length(teams))
  for(j in 1:length(season)){
    links[[i]][j] <- paste0(link_base, teams[i], "/", season[j], end_link)
  }
}

这导致：

> links
[[1]]
[1] "http://www.baseball-reference.com/teams/CHC/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/CHC/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/CHC/2002-schedule-scores.shtml"

[[2]]
[1] "http://www.baseball-reference.com/teams/STL/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/STL/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/STL/2002-schedule-scores.shtml"

现在，对于列表中的每个元素，我想使用readHTMLTable函数，以便我可以解析信息。我试过这样做：

a <- list()
for(i in 1:length(teams)){
  a[[i]] <- NaN*seq(length(teams))
  for(j in 1:length(season)){
    a[[i]][j] <- readHTMLTable(links[[i]][j])
  }
}

readHTMLTable返回长度为6的列表：

x <- readHTMLTable(links[[1]][1])
> length(x)
[1] 6

我希望list a的第一个元素存储到readHTMLTable函数的输出中，用于＆＃34; CHC＆＃34;链接。我想列表a的第二个元素来存储来自readHTMLTable函数的输出，用于＆＃34; STL＆＃34;链接。因此，列表a将包括2个元素。这两个元素将包含3个由6个元素组成的列表。

Answer 1

我认为这有效

lst <- lapply(links, function(l) lapply(l, function(x) readHTMLTable(x)))

length(lst)
# [1] 2
lengths(lst)
# [1] 3 3

第一个子列表应该有CHC，第二个子列表应该是STL。

R：如何在列表中存储列表？

1 个答案: