我目前正在尝试创建一个累积和列,该列将根据Game_ID创建累积总和,但只计算一次与Game_ID相关的值。例如,玩家A在Game_ID == 1中拍摄20张照片,在Game_ID == 2拍摄13张照片。对于累积总和,我希望Shot_Count值(基于Game_ID)只计算一次,尽管出现在Shot_Count中列多次。请考虑以下数据集:
Name Game_ID Shot_Count CumSum_Shots
Player A 1 20 20
Player B 1 15 15
Player A 1 20 20
Player A 2 13 33 ## (20 + 13)
Player A 2 13 33 ## (20 + 13)
Player B 2 35 50 ## (15 + 35)
Player A 3 30 63 ## (33 + 30)
Player B 3 20 70 ## (50 + 20)
Player A 3 30 63 ## (33 + 30)
Player A 4 12 75 ## (63 + 12)
Player A 4 12 75 ## (63 + 12)
Player B 4 10 80 ## (70 + 10)
请记住,还有其他变量可以使行1和3等不重复。我只是想将数据集简化为相关的变量。
我尝试将cumsum函数与data.table库一起使用:
library(data.table)
dt[ , CumSum_Shots := cumsum(Shot_Count), by = list(dt$Name, dt$Game_ID)]
然而,这总结了基于游戏的Shot_Count行(即CumSum_Shots第三行将是40)。这段代码是有意义的,但我不确定是什么data.table语法存在,以使代码考虑dt $ Game_ID的唯一值。
答案 0 :(得分:4)
独特,计算,然后合并:
dt[unique(dt, by = c('Name', 'Game_ID', 'Shot_Count'))
[, Cum_Shots := cumsum(Shot_Count), by = Name]
, on = .(Name, Game_ID), Cum_Shots := Cum_Shots]
R是一种肮脏的语言。
答案 1 :(得分:1)
我假设您已经使用data.table
,那么您可以这样做:
代码:
library(data.table)
merge(dt,
dt[, Shot_Count[1], .(Name, Game_ID)][, .(CumSum_Shots = cumsum(V1), Game_ID), Name],
sort = FALSE)
输出:
Name Game_ID Shot_Count CumSum_Shots 1: Player A 1 20 20 2: Player B 1 15 15 3: Player A 1 20 20 4: Player A 2 13 33 5: Player A 2 13 33 6: Player B 2 35 50 7: Player A 3 30 63 8: Player B 3 20 70 9: Player A 3 30 63 10: Player A 4 12 75 11: Player A 4 12 75 12: Player B 4 10 80
说明:
dt[, Shot_Count[1], .(Name, Game_ID)]
:按[1]
和Group_ID
拍摄第一张照片(Name
)。 OP想要的是什么(只计算一次)。[, .(CumSum_Shots = cumsum(V1), Game_ID), Name]
:按Name
计算总和并保留Group_ID
信息。merge(dt, ..., sort = FALSE)
:与原始数据合并并保留原始订单。输入(dt
):
structure(list(Name = c("Player A", "Player B", "Player A", "Player A",
"Player A", "Player B", "Player A", "Player B", "Player A", "Player A",
"Player A", "Player B"), Game_ID = c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 4L, 4L, 4L), Shot_Count = c(20L, 15L, 20L, 13L, 13L,
35L, 30L, 20L, 30L, 12L, 12L, 10L)), .Names = c("Name", "Game_ID",
"Shot_Count"), row.names = c(NA, -12L), class = c("data.table",
"data.frame"))
编辑:
使用长data.table
语法字符串时,我更喜欢magrittr
个管道:
library(magrittr)
dt %>%
.[, Shot_Count[1], .(Name, Game_ID)] %>%
.[, .(CumSum_Shots = cumsum(V1), Game_ID), Name] %>%
merge(dt, ., sort = FALSE)
答案 2 :(得分:1)
如果没有合并,您可以cumsum
唯一值(Name
,Game
和Shots
),然后rep
来获得正确的长度。
dt[, CumSum_Shots2 := rep(cumsum(Shot_Count[!duplicated(Game_ID)]), times = .SD[,.N,by = .(Game_ID, Shot_Count)]$N) ,
by = .(Name)]
dt
# Name Game_ID Shot_Count CumSum_Shots CumSum_Shots2
#1: PlayerA 1 20 20 20
#2: PlayerB 1 15 15 15
#3: PlayerA 1 20 20 20
#4: PlayerA 2 13 33 33
#5: PlayerA 2 13 33 33
#6: PlayerB 2 35 50 50
#7: PlayerA 3 30 63 63
#8: PlayerB 3 20 70 70
#9: PlayerA 3 30 63 63
#10: PlayerA 4 12 75 75
#11: PlayerA 4 12 75 75
#12: PlayerB 4 10 80 80