Question

我目前正在尝试创建一个累积和列，该列将根据Game_ID创建累积总和，但只计算一次与Game_ID相关的值。例如，玩家A在Game_ID == 1中拍摄20张照片，在Game_ID == 2拍摄13张照片。对于累积总和，我希望Shot_Count值（基于Game_ID）只计算一次，尽管出现在Shot_Count中列多次。请考虑以下数据集：

Name         Game_ID       Shot_Count        CumSum_Shots
Player A         1             20                20 
Player B         1             15                15 
Player A         1             20                20
Player A         2             13                33 ## (20 + 13)
Player A         2             13                33 ## (20 + 13)
Player B         2             35                50 ## (15 + 35)
Player A         3             30                63 ## (33 + 30)
Player B         3             20                70 ## (50 + 20)
Player A         3             30                63 ## (33 + 30)
Player A         4             12                75 ## (63 + 12)
Player A         4             12                75 ## (63 + 12)
Player B         4             10                80 ## (70 + 10)

请记住，还有其他变量可以使行1和3等不重复。我只是想将数据集简化为相关的变量。

我尝试将cumsum函数与data.table库一起使用：

library(data.table)
dt[ , CumSum_Shots := cumsum(Shot_Count), by = list(dt$Name, dt$Game_ID)]

然而，这总结了基于游戏的Shot_Count行（即CumSum_Shots第三行将是40）。这段代码是有意义的，但我不确定是什么data.table语法存在，以使代码考虑dt $ Game_ID的唯一值。

Answer 1

独特，计算，然后合并：

dt[unique(dt, by = c('Name', 'Game_ID', 'Shot_Count'))
       [, Cum_Shots := cumsum(Shot_Count), by = Name]
   , on = .(Name, Game_ID), Cum_Shots := Cum_Shots]

R是一种肮脏的语言。

Answer 2

我假设您已经使用data.table，那么您可以这样做：

代码：

library(data.table)
merge(dt, 
      dt[, Shot_Count[1], .(Name, Game_ID)][, .(CumSum_Shots = cumsum(V1), Game_ID), Name], 
      sort = FALSE)

输出：

        Name Game_ID Shot_Count CumSum_Shots
 1: Player A       1         20           20
 2: Player B       1         15           15
 3: Player A       1         20           20
 4: Player A       2         13           33
 5: Player A       2         13           33
 6: Player B       2         35           50
 7: Player A       3         30           63
 8: Player B       3         20           70
 9: Player A       3         30           63
10: Player A       4         12           75
11: Player A       4         12           75
12: Player B       4         10           80

说明：

dt[, Shot_Count[1], .(Name, Game_ID)]：按[1]和Group_ID拍摄第一张照片（Name）。 OP想要的是什么（只计算一次）。
[, .(CumSum_Shots = cumsum(V1), Game_ID), Name]：按Name计算总和并保留Group_ID信息。
merge(dt, ..., sort = FALSE)：与原始数据合并并保留原始订单。

输入（dt）：

structure(list(Name = c("Player A", "Player B", "Player A", "Player A", 
"Player A", "Player B", "Player A", "Player B", "Player A", "Player A", 
"Player A", "Player B"), Game_ID = c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L), Shot_Count = c(20L, 15L, 20L, 13L, 13L, 
35L, 30L, 20L, 30L, 12L, 12L, 10L)), .Names = c("Name", "Game_ID", 
"Shot_Count"), row.names = c(NA, -12L), class = c("data.table", 
"data.frame"))

编辑：

使用长data.table语法字符串时，我更喜欢magrittr个管道：

library(magrittr)
dt %>%
    .[, Shot_Count[1], .(Name, Game_ID)] %>%
    .[, .(CumSum_Shots = cumsum(V1), Game_ID), Name] %>%
    merge(dt, ., sort = FALSE)

Answer 3

如果没有合并，您可以cumsum唯一值（Name，Game和Shots），然后rep来获得正确的长度。

dt[, CumSum_Shots2 := rep(cumsum(Shot_Count[!duplicated(Game_ID)]), times = .SD[,.N,by = .(Game_ID, Shot_Count)]$N) , 
   by = .(Name)]

dt
 #      Name Game_ID Shot_Count CumSum_Shots CumSum_Shots2
 #1: PlayerA       1         20           20            20
 #2: PlayerB       1         15           15            15
 #3: PlayerA       1         20           20            20
 #4: PlayerA       2         13           33            33
 #5: PlayerA       2         13           33            33
 #6: PlayerB       2         35           50            50
 #7: PlayerA       3         30           63            63
 #8: PlayerB       3         20           70            70
 #9: PlayerA       3         30           63            63
#10: PlayerA       4         12           75            75
#11: PlayerA       4         12           75            75
#12: PlayerB       4         10           80            80

CumSum仅根据组

3 个答案: