基于多个列值的数据帧的行总数

时间:2019-01-13 14:00:20

标签: r

我正在处理大学生足球运动员的大型数据框及其游戏相关统计数据。看起来像这样:

Name      School     Year     Receptions     Receiving_Yards
Player1   College1   2004       10                200 
Player2   College2   2002       15                150
Player3   College3   2007       11                110
Player1   College1   2004       17                150
Player2   College2   2002       13                130
Player1   College1   2005       14                170

我希望能够基于多个条件合并行:

  1. 我想创建一个数据框,该数据框将基于球员,学校和年份的所有内容组合在一起,以获取该赛季的累积统计数据。像这样:

    Name      School     Year     Receptions     Receiving_Yards
    Player1   College1   2004       27                350 
    Player2   College2   2002       28                280
    Player3   College3   2007       11                110
    Player1   College1   2005       14                170
    
  2. 我想创建一个数据框,该数据框仅基于球员和学校而组合所有内容(即让我获得职业统计数据),但要提供年份范围:

    Name      School     From    to      Receptions     Receiving_Yards
    Player1   College1   2004   2005        41                520 
    Player2   College2   2002   2002        28                280
    Player3   College3   2007   2007        11                110
    

我并不完全同意获得2年的成绩,因为不太可能有太多同名球员在同一所学校打过球。

我已经看到一些关于仅基于一个条件组合行的帖子,但是当我使用多个条件时,我会怎么做呢?

谢谢!

2 个答案:

答案 0 :(得分:0)

当然,您可以使用tidyverse方法来解决它。我在这里提供了基本方法。

第一个结果

aggregate(. ~ Name + School + Year, df, sum)

#      Name   School Year Receptions Receiving_Yards
# 1 Player2 College2 2002         28             280
# 2 Player1 College1 2004         27             350
# 3 Player1 College1 2005         14             170
# 4 Player3 College3 2007         11             110

第二个结果

a <- aggregate(cbind(Receptions, Receiving_Yards) ~ Name + School, df, sum)
b <- aggregate(Year ~ Name + School, df, range)
merge(a, b)

#      Name   School Receptions Receiving_Yards Year.1 Year.2
# 1 Player1 College1         41             520   2004   2005
# 2 Player2 College2         28             280   2002   2002
# 3 Player3 College3         11             110   2007   2007

使用dplyr

的解决方案
library(dplyr)

# (1)
df %>% group_by(Name, School, Year) %>% summarise_all(sum)

# (2)
df %>% group_by(Name, School) %>%
  summarise(From = first(Year), To = last(Year),
            Receptions = sum(Receptions),
            Receiving_Yards = sum(Receiving_Yards))

答案 1 :(得分:0)

添加data.table替代项:

library(data.table)
df1<-copy(df)
setDT(df1)
df1[,`:=`(From=first(Year),To=last(Year)),by=.(Name,School)
][,lapply(.SD,sum),by=.(Name,School,From,To),.SDcols=c("Receptions","Receiving_Yards")]

输出:

     Name   School  From   To     Receptions Receiving_Yards
1: Player2 College2 2002 2002         28             280
2: Player1 College1 2004 2005         41             520
3: Player3 College3 2007 2007         11             110

另一部分:

df1<-copy(df)
setDT(df1)
df1[,lapply(.SD,sum),by=.(Name,School,Year)]

或者如果您不想重新制作data.table,则从最后一部分(导致第一个输出)中删除列

#df1<-copy(df) No need,see next
#setDT(df1) No need since you're using the same object as previously used
df1[,`:=`(From=NULL,To=NULL)]
df1[,lapply(.SD,sum),by=.(Name,School,Year)]
df1

输出:

      Name   School Year Receptions Receiving_Yards
1: Player1 College1 2004         27             350
2: Player2 College2 2002         28             280
3: Player3 College3 2007         11             110
4: Player1 College1 2005         14             170