Question

我有一些代码可以从此链接（http://stats.ncaa.org/team/stats?org_id=575&sport_year_ctl_id=12280）中删除数据并运行一些计算。

我想要做的是循环每个团队，收集并运行每个团队的操作。我有一个包含每个团队链接的数据框，如上面那个。

Psuedo代码： for（在teamlist中链接） {刮，操纵，放入桌子}

但是，我无法弄清楚如何在链接中运行循环。

我尝试过做URL = teamlist $ link [i]，但是在使用readhtmltable（）时出错了。我可以毫不费力地将每个团队的个人URL粘贴到脚本中，只是在尝试从表中提取时。

当前代码：

library(XML)
library(gsubfn)

URL= 'http://stats.ncaa.org/team/stats?org_id=575&sport_year_ctl_id=12280'  
tx<- readLines(URL)
tx2<-gsub("</tbody>","",tx)
tx2<-gsub("<tfoot>","",tx2)
tx2<-gsub("</tfoot>","</tbody>",tx2)
Player_Stats = readHTMLTable(tx2,asText=TRUE, header = T, which = 2,stringsAsFactors = F)

感谢。

Answer 1

我同意@ialm你应该查看rvest包，这使循环链接变得非常有趣和简单。我将在这里使用类似的主题创建一些示例代码供您查看。

这里我将生成一个链接列表，我将遍历

rm(list=ls())
library(rvest)
mainweb="http://www.basketball-reference.com/"

urls=html("http://www.basketball-reference.com/teams") %>%
html_nodes("#active a") %>%
html_attrs()

现在链接列表已经完成，我遍历每个链接并从每个

中拉出一个表

teamdata=c()
j=1
for(i in urls){
bball <- html(paste(mainweb, i, sep=""))
teamdata[j]= bball %>%
html_nodes(paste0("#", gsub("/teams/([A-Z]+)/$","\\1", urls[j], perl=TRUE))) %>%
html_table()
j=j+1
}

Answer 2

请参阅下面的代码，该代码基本上构建了代码并循环遍历由向量team_codes标识的两个不同的团队页面。这些表在列表中返回，其中每个列表元素对应一个团队的表。但是，表格看起来需要更多清洁。

library(XML)
library(gsubfn)

Player_Stats <- list()
j <- 1
team_codes <-  c(575, 580)
for(code in team_codes) {

  URL <- paste0('http://stats.ncaa.org/team/stats?org_id=', code, '&sport_year_ctl_id=12280')
  tx<- readLines(URL)
  tx2<-gsub("</tbody>","",tx)
  tx2<-gsub("<tfoot>","",tx2)
  tx2<-gsub("</tfoot>","</tbody>",tx2)
  Player_Stats[[j]] = readHTMLTable(tx2,asText=TRUE, header = T, which = 2,stringsAsFactors = F)
  j <- j + 1

}

R：循环链接列表

2 个答案:

这里我将生成一个链接列表，我将遍历

现在链接列表已经完成，我遍历每个链接并从每个