R data.table: summarise values of several rows

时间:2016-04-25 09:17:11

标签: r data.table

I have a data.table in R which looks like this one:

   code gruppe proz_grouped
1:    1      2    14.751689
2:    2      2    22.063523
3:    3      2    35.441111
4:    4      2    27.743676
5:    1      3     7.575869
6:    2      3    23.420090
7:    3      3    38.513576
8:    4      3    30.490465

Is there an easy, elegant way to get the sum of proz_grouped for the codes (code) 3 and 4 by group gruppe? The result should look sth. like this:

   code gruppe proz_grouped
1:    1      2    14.751689
2:    2      2    22.063523
3:    NA     2    63.18471
5:    1      3     7.575869
6:    2      3    23.420090
7:    NA     3    69.0035

Since code cannot be summarized, I would expect an NA for the code column.

Thanks

2 个答案:

答案 0 :(得分:2)

We can use recode to change the values and then do the group by sum

library(data.table)
library(car)
df1[, code := recode(code, "c(3,4)=NA")
        ][, list(proz_grouped = sum(proz_grouped)), .(code, gruppe)]
#  code gruppe proz_grouped
#1:    1      2    14.751689
#2:    2      2    22.063523
#3:   NA      2    63.184787
#4:    1      3     7.575869
#5:    2      3    23.420090
#6:   NA      3    69.004041

Or use %in% to change 3, 4 into NA, group by 'code', 'gruppe' and get the sum of 'proz_grouped'

 df1[code %in% 3:4, code := NA][,
       .(proz_grouped = sum(proz_grouped)) ,.(code, gruppe)]

答案 1 :(得分:2)

dt[, .(proz_grouped = sum(proz_grouped))
   , by = .(code = replace(code, code > 2, NA), gruppe)]
#   code gruppe proz_grouped
#1:    1      2    14.751689
#2:    2      2    22.063523
#3:   NA      2    63.184787
#4:    1      3     7.575869
#5:    2      3    23.420090
#6:   NA      3    69.004041