找到max(),但删除了关系。

时间:2017-01-24 15:28:55

标签: r dataframe dplyr

我试图找出6个不同棒球变量和联盟平均值的每年联赛领先者之间的差异。我的目标是找到球员总数与联盟平均数之间的最大差异。例如,Babe Ruth在1927年打出60个本垒打,联盟平均每个球员6.30,所以差异是53.7。

我创建了这个leag_avg数据框:

leag_avg <- batting100 %>%
        group_by(yearID) %>%
        summarise(lgba_avg = round(sum(H, na.rm = T)/sum(AB, na.rm = T),digits = 3),
                  lghr_avg = round(mean(HR, na.rm = T), digits = 2) ,
                  lgrbi_avg = round(mean(RBI, na.rm = T),digits = 2),
                  lgslg_avg = round(mean(slg, na.rm = T),digits = 3),
                  lgobp_avg = round(mean(obp, na.rm = T),digits = 3),
                  lgruns_avg = round(mean(R, na.rm = T),digits = 2),
                  soratio = round(mean(so_ratio, na.rm = T), digits =2)) 

This gave me all years in the data frame (1871-2015) and the league average for each variable. 133 observations of 8 variables. 

然后我发现每年最高的本垒打:

bestHR <- batting100 %>%
      group_by(yearID) %>%
      summarise(highest_HR = max(HR))

然后我合并将playerID添加到数据框:

bestHR2 <- merge(bestHR, batting100[, c("yearID", "HR", "playerID")], by.x = c("yearID", "highest_HR"), by.y = c("yearID", "HR"))

BestHR2返回对3个变量的153个观测值。由于关系,我的联盟平均观察次数超过20次。为了使我的观察结果达到133所以我可以进行计算,我将需要消除关系。有谁知道如何做到这一点?例如,1886年,有2人与11个本垒打并列。我怎样才能删除其中一个?

0 个答案:

没有答案