我正在处理大学生足球运动员的大型数据框及其游戏相关统计数据。看起来像这样:
Name School Year Receptions Receiving_Yards
Player1 College1 2004 10 200
Player2 College2 2002 15 150
Player3 College3 2007 11 110
Player1 College1 2004 17 150
Player2 College2 2002 13 130
Player1 College1 2005 14 170
我希望能够基于多个条件合并行:
我想创建一个数据框,该数据框将基于球员,学校和年份的所有内容组合在一起,以获取该赛季的累积统计数据。像这样:
Name School Year Receptions Receiving_Yards
Player1 College1 2004 27 350
Player2 College2 2002 28 280
Player3 College3 2007 11 110
Player1 College1 2005 14 170
我想创建一个数据框,该数据框仅基于球员和学校而组合所有内容(即让我获得职业统计数据),但要提供年份范围:
Name School From to Receptions Receiving_Yards
Player1 College1 2004 2005 41 520
Player2 College2 2002 2002 28 280
Player3 College3 2007 2007 11 110
我并不完全同意获得2年的成绩,因为不太可能有太多同名球员在同一所学校打过球。
我已经看到一些关于仅基于一个条件组合行的帖子,但是当我使用多个条件时,我会怎么做呢?
谢谢!
答案 0 :(得分:0)
当然,您可以使用tidyverse方法来解决它。我在这里提供了基本方法。
第一个结果
aggregate(. ~ Name + School + Year, df, sum)
# Name School Year Receptions Receiving_Yards
# 1 Player2 College2 2002 28 280
# 2 Player1 College1 2004 27 350
# 3 Player1 College1 2005 14 170
# 4 Player3 College3 2007 11 110
第二个结果
a <- aggregate(cbind(Receptions, Receiving_Yards) ~ Name + School, df, sum)
b <- aggregate(Year ~ Name + School, df, range)
merge(a, b)
# Name School Receptions Receiving_Yards Year.1 Year.2
# 1 Player1 College1 41 520 2004 2005
# 2 Player2 College2 28 280 2002 2002
# 3 Player3 College3 11 110 2007 2007
使用dplyr
library(dplyr)
# (1)
df %>% group_by(Name, School, Year) %>% summarise_all(sum)
# (2)
df %>% group_by(Name, School) %>%
summarise(From = first(Year), To = last(Year),
Receptions = sum(Receptions),
Receiving_Yards = sum(Receiving_Yards))
答案 1 :(得分:0)
添加data.table
替代项:
library(data.table)
df1<-copy(df)
setDT(df1)
df1[,`:=`(From=first(Year),To=last(Year)),by=.(Name,School)
][,lapply(.SD,sum),by=.(Name,School,From,To),.SDcols=c("Receptions","Receiving_Yards")]
输出:
Name School From To Receptions Receiving_Yards
1: Player2 College2 2002 2002 28 280
2: Player1 College1 2004 2005 41 520
3: Player3 College3 2007 2007 11 110
另一部分:
df1<-copy(df)
setDT(df1)
df1[,lapply(.SD,sum),by=.(Name,School,Year)]
或者如果您不想重新制作data.table,则从最后一部分(导致第一个输出)中删除列
#df1<-copy(df) No need,see next
#setDT(df1) No need since you're using the same object as previously used
df1[,`:=`(From=NULL,To=NULL)]
df1[,lapply(.SD,sum),by=.(Name,School,Year)]
df1
输出:
Name School Year Receptions Receiving_Yards
1: Player1 College1 2004 27 350
2: Player2 College2 2002 28 280
3: Player3 College3 2007 11 110
4: Player1 College1 2005 14 170