I have a data.table in R which looks like this one:
code gruppe proz_grouped
1: 1 2 14.751689
2: 2 2 22.063523
3: 3 2 35.441111
4: 4 2 27.743676
5: 1 3 7.575869
6: 2 3 23.420090
7: 3 3 38.513576
8: 4 3 30.490465
Is there an easy, elegant way to get the sum of proz_grouped for the codes (code) 3 and 4 by group gruppe? The result should look sth. like this:
code gruppe proz_grouped
1: 1 2 14.751689
2: 2 2 22.063523
3: NA 2 63.18471
5: 1 3 7.575869
6: 2 3 23.420090
7: NA 3 69.0035
Since code cannot be summarized, I would expect an NA for the code column.
Thanks
答案 0 :(得分:2)
We can use recode
to change the values and then do the group by sum
library(data.table)
library(car)
df1[, code := recode(code, "c(3,4)=NA")
][, list(proz_grouped = sum(proz_grouped)), .(code, gruppe)]
# code gruppe proz_grouped
#1: 1 2 14.751689
#2: 2 2 22.063523
#3: NA 2 63.184787
#4: 1 3 7.575869
#5: 2 3 23.420090
#6: NA 3 69.004041
Or use %in%
to change 3, 4 into NA, group by 'code', 'gruppe' and get the sum
of 'proz_grouped'
df1[code %in% 3:4, code := NA][,
.(proz_grouped = sum(proz_grouped)) ,.(code, gruppe)]
答案 1 :(得分:2)
dt[, .(proz_grouped = sum(proz_grouped))
, by = .(code = replace(code, code > 2, NA), gruppe)]
# code gruppe proz_grouped
#1: 1 2 14.751689
#2: 2 2 22.063523
#3: NA 2 63.184787
#4: 1 3 7.575869
#5: 2 3 23.420090
#6: NA 3 69.004041