通过数据框循环功能,区分,然后在原始数据框中按列合并

时间:2020-02-18 20:05:41

标签: r function loops

我正在尝试使用打包程序Baller从Basketballreference收集一些数据。我正在尝试使用NBASeasonTeamByYear函数收集跨多个赛季的球队的赛季成绩。那就是我想要每个团队在2017年至2020年之间的数据,然后再将数据框合并为两个较大的数据框,并按会议分开。

我首先用每个团队的代码和会议制作了一个数据框

league_teams <- data.frame("team" = c("ATL", "BOS", "NJN", "CHA", "CHI", "CLE", "DAL", "DEN", 
                                  "DET", "GSW", "HOU", "IND", "LAC", "LAL", "MEM", "MIA",
                                  "MIL", "MIN", "NOH", "NYK", "OKC", "ORL", "PHI", "PHO",
                                  "POR", "SAC", "SAS", "TOR", "UTA", "WAS"), 
                       "conference" = c("East", "East", "East", "East", "East", "East", "West",
                                        "West", "East", "West", "West", "East", "West", "West",
                                        "West", "East", "East", "West", "West", "East", "West",
                                        "East", "East", "West", "West", "West", "West", "East",
                                        "West", "East"))
league_teams$team <- as.character(league_teams$team)
league_teams$conference <- as.factor(league_teams$conference)

现在,我很难编写一个循环,该循环首先使用每个团队的代码和我想要的年份来使用该功能,然后在不考虑年份的情况下(无论在每次会议中)将它们组合在一起。

我从

开始
   for (team in league_teams) {

  team_2017 <- NBASeasonTeamByYear(team = team, 2017)
  team_2017$season <- as.factor(2017)
  team_2017$team <- as.factor(team)

}

后面的几行说明我想为相应的年份添加2列,为相应的团队代码添加2列,但不仅是2017年,而且一直到2020年。尽管我在编写循环和思考时遇到了麻烦稍后,我将使用rbind进行合并,但是我不确定该如何做,并且无法通过会议在我制作的原始数据框中进行区分。

1 个答案:

答案 0 :(得分:1)

请考虑使用用户定义的方法来概括您的流程,并使用expand.grid(所有组合)和Map(元素循环)传递参数:

nba_df_build <- function(yr, team, conf) {    
  # base::TRANSFORM OR dplyr::MUTATE
  transform(NBASeasonTeamByYear(team = team, season = yr),         
            season = as.factor(yr),
            team = as.factor(team),
            conference = as.factor(conf))  
}

params_df <- expand.grid(year = 2017:2020,
                         team = league_teams$team,
                         conference = league_teams$conference)

df_list <- Map(nba_df_build, params_df$year, params_df$team, params_df$conference)

final_df <- do.call(rbind, df_list)
#final_df <- dplyr::bind_rows(df_list)

对于数据帧的任何分割:

# LIST OF TWO CONFERENCE DATA FRAMES
conference_dfs <- split(final_df, final_df$conference)

# LIST OF FOUR SEASON DATA FRAMES
season_dfs <- split(final_df, final_df$season)

# LIST OF THIRTY TEAM DATA FRAMES
team_dfs <- split(final_df, final_df$team)