我对R和rvest软件包非常陌生,我正尝试从多个页面的多个表中提取数据。
一个例子是这里每个游戏的盒子得分:
https://www.pro-football-reference.com/boxscores/201309050den.htm
我尝试了以下操作来从一个表中获取数据:
library(rvest)
webpage <- read_html("https://www.pro-football-reference.com/boxscores/201309050den.htm")
tbls <- html_nodes(webpage, "table")
head(tbls)
tbls_ls <- webpage %>%
html_nodes("table") %>%
.[3:3] %>%
html_table(fill = TRUE)
str(tbls_ls)
这将返回:
List of 1
$ :'data.frame': 22 obs. of 22 variables:
..$ : chr [1:22] "Player" "Joe Flacco" "Ray Rice" "Bernard Pierce" ...
..$ : chr [1:22] "Tm" "BAL" "BAL" "BAL" ...
..$ Passing : chr [1:22] "Cmp" "34" "0" "0" ...
..$ Passing : chr [1:22] "Att" "62" "0" "0" ...
..$ Passing : chr [1:22] "Yds" "362" "0" "0" ...
..$ Passing : chr [1:22] "TD" "2" "0" "0" ...
..$ Passing : chr [1:22] "Int" "2" "0" "0" ...
..$ Passing : chr [1:22] "Sk" "4" "0" "0" ...
..$ Passing : chr [1:22] "Yds" "27" "0" "0" ...
..$ Passing : chr [1:22] "Lng" "34" "0" "0" ...
..$ Passing : chr [1:22] "Rate" "69.4" "" "" ...
..$ Rushing : chr [1:22] "Att" "0" "12" "9" ...
..$ Rushing : chr [1:22] "Yds" "0" "36" "22" ...
..$ Rushing : chr [1:22] "TD" "0" "1" "0" ...
..$ Rushing : chr [1:22] "Lng" "0" "12" "14" ...
..$ Receiving: chr [1:22] "Tgt" "0" "11" "1" ...
..$ Receiving: chr [1:22] "Rec" "0" "8" "0" ...
..$ Receiving: chr [1:22] "Yds" "0" "35" "0" ...
..$ Receiving: chr [1:22] "TD" "0" "0" "0" ...
..$ Receiving: chr [1:22] "Lng" "0" "10" "0" ...
..$ Fumbles : chr [1:22] "Fmb" "1" "0" "0" ...
..$ Fumbles : chr [1:22] "FL" "0" "0" "0" ...
但这只是一场比赛的一张桌子。
我试图在每年的每周中浏览每个Boxscore的所有页面。
所有页面均以URL的这一部分开头:
https://www.pro-football-reference.com/boxscores/
但是然后我需要遍历一年中的所有日期,例如:
201309050
201309080
和团队:
den
buf
(这将是NFL中的所有32支球队)
上面的两个示例将转到以下两个URL:
https://www.pro-football-reference.com/boxscores/201309050den.htm
https://www.pro-football-reference.com/boxscores/201309080buf.htm
如果我有一个日期向量和一个团队向量,是否有办法遍历每个日期来检查每个组合并从每页的表中返回信息?
或者我可以使用开始日期和结束日期,然后以某种方式使用每个团队名称浏览范围内的每个日期?
开始日期为
20130901
结束日期为
20140301
(针对2013赛季)。最好还有2010年至2019年的整个季节。
理想情况下,我想遍历一年中的每个日期以及每个团队,如果返回记录,我想将它们全部添加到一个表中,如下所示:
Year Week Player Team Cmp Att Yds TD Int Sk Yds Lng Rate Att Yds TD Lng Tht Rec Yds TD Lng Fmb FL
最好只返回每个四分卫的记录,尽管我不确定如何实现。