DT <- data.table(id = rep(1:3, 2),
class = rep(letters[1:6]),
des = rep(LETTERS[1:2], 3))
看起来像这样:
id class des
1: 1 a A
2: 2 b B
3: 3 c A
4: 1 d B
5: 2 e A
6: 3 f B
问题是我需要堆栈不同的值(字符串类型)变量class&amp; des将每个id分成一行,即如何将data.table转换为以下形状:
id class des
1: 1 a, d A, B
2: 2 b, e B, A
3: 3 c, f A, B
我尝试过这样的事情,但结果并不是我的预期。
DT %>%
dcast(id ~ ..., fun = function(x) paste(x, ", "), value.var = c("class", "des"))
id class des
1: 1 d , B ,
2: 2 e , A ,
3: 3 f , B ,
答案 0 :(得分:1)
如果您接受dplyr
解决方案,则可以采用以下解决方案。
DT %>%
group_by(id) %>%
summarise_at(vars(class, des), paste, collapse = ", ")
答案 1 :(得分:1)
您真的不需要使用dcast()
。通过data.table
对id
进行分组,然后使用lapply()
查看列并使用paste()
与collapse = ", "
进行汇总,可以更简单地进行汇总:
DT[, lapply(.SD, paste, collapse = ", "), by = id]
结果如下:
id class des
1: 1 a, d A, B
2: 2 b, e B, A
3: 3 c, f A, B
您会发现此解决方案比使用dcast()
快得多:
library(microbenchmark)
microbenchmark(dcast = dcast(DT, id ~ ...,
fun = function(x) paste(x, collapse = ", "),
value.var = c("class", "des")),
group = DT[, lapply(.SD, paste, collapse = ", "), by = id],
times = 100)
Unit: microseconds
expr min lq mean median uq max neval
dcast 2460.732 2639.4095 3118.5706 2815.3385 3221.251 6942.144 100
group 305.014 329.2315 374.9927 347.6135 377.440 670.746 100
答案 2 :(得分:1)
折叠是重要的部分 - 使用paste(x, collapse = ", ")
来填充字符串聚合:
library(data.table)
library(magrittr)
DT %>%
dcast(id ~ ...,
fun = function(x) paste(x, collapse = ", "),
value.var = c("class", "des"))