如何使用dcast()按组堆叠/折叠字符串

时间:2018-04-05 11:38:33

标签: r data.table

DT <- data.table(id = rep(1:3, 2),
                    class = rep(letters[1:6]),
                    des = rep(LETTERS[1:2], 3))

看起来像这样:

   id class des
1:  1     a   A
2:  2     b   B
3:  3     c   A
4:  1     d   B
5:  2     e   A
6:  3     f   B

问题是我需要堆栈不同的值(字符串类型)变量class&amp; des将每个id分成一行,即如何将data.table转换为以下形状

   id    class      des
1:  1     a, d     A, B
2:  2     b, e     B, A
3:  3     c, f     A, B

我尝试过这样的事情,但结果并不是我的预期。

DT %>% 
  dcast(id ~ ..., fun = function(x) paste(x, ", "), value.var = c("class", "des"))

   id   class    des
1:  1    d ,    B , 
2:  2    e ,    A , 
3:  3    f ,    B , 

3 个答案:

答案 0 :(得分:1)

如果您接受dplyr解决方案,则可以采用以下解决方案。

DT %>%
  group_by(id) %>%
  summarise_at(vars(class, des), paste, collapse = ", ")

答案 1 :(得分:1)

您真的不需要使用dcast()。通过data.tableid进行分组,然后使用lapply()查看列并使用paste()collapse = ", "进行汇总,可以更简单地进行汇总:

DT[, lapply(.SD, paste, collapse = ", "), by = id]

结果如下:

   id class  des
1:  1  a, d A, B
2:  2  b, e B, A
3:  3  c, f A, B

您会发现此解决方案比使用dcast()快得多:

library(microbenchmark)

microbenchmark(dcast = dcast(DT, id ~ ..., 
                            fun = function(x) paste(x, collapse = ", "), 
                            value.var = c("class", "des")),
               group = DT[, lapply(.SD, paste, collapse = ", "), by = id],
               times = 100)

Unit: microseconds
  expr      min        lq      mean    median       uq      max neval
 dcast 2460.732 2639.4095 3118.5706 2815.3385 3221.251 6942.144   100
 group  305.014  329.2315  374.9927  347.6135  377.440  670.746   100

答案 2 :(得分:1)

折叠是重要的部分 - 使用paste(x, collapse = ", ")来填充字符串聚合:

library(data.table)
library(magrittr)

DT %>% 
   dcast(id ~ ..., 
         fun = function(x) paste(x, collapse = ", "), 
         value.var = c("class", "des"))