Question

我尝试在此特定方案中使用aggregate()语法复制data.table基本函数：

# make it reproducible
set.seed(16)

# create data.table
DT <- data.table(source = sample(letters, 100, replace = TRUE), target = sample(LETTERS, 100, replace = TRUE))
#     source target
#  1:      j      J
#  2:      d      K
#  3:      w      L
#  4:      g      J
#  ...

# aggregate using base function
aggregate(list(target = DT$target), by = list(source = DT$source), FUN = function(x) paste(x, sep = ", "))
#   source              target
#1       a          L, W, S, W
#2       b V, H, R, J, G, W, N
#3       c          Y, C, I, K
#4       d          K, A, P, V
# ...

我尝试了一些使用data.table语法的东西，但我没有让它工作：

DT[, .(target = paste(target, sep = ", ")), by = source]
#     source target
#  1:      r      P
#  2:      r      I
#  3:      r      Y
#  4:      r      G
#  ...

DT[, target := paste(target, sep = ", "), by = source]
#     source target
#  1:      r      P
#  2:      g      C
#  3:      l      U
#  4:      f      J
#  ...

正确的方法是什么？

加分：删除输出中的重复LETTERS（即：第1行应为L, W, S，而不是L, W, S, W）

谢谢！

Answer 1

如果我们需要在＆＃39; target＆＃39;中获取所有元素的单个字符串。对于每个来源＆＃39;，请在collapse中使用paste参数。这可以使用toString（paste(..., collapse=", ")）

更紧凑地编写

DT[, .(target = toString(target)), by = source]

我们还可以拥有paste列

，而不是list成一个字符串

DT[, .(target = list(target)), by = source]

与OP的帖子中的aggregate输出类似（尽管意图似乎与paste不同）

更新

如果我们只需要唯一元素，请使用unique

DT[, .(target = toString(unique(target))), by = source]

DT[, .(target = list(unique(target))), by = source]

另外，如果我们需要sort，请用sort

换行

DT[, .(target = toString(sort(unique(target)))), by = source]

在OP的aggregate代码中，sep并未将字符串折叠为单个字符串，而是我们得到的是list列

str(aggregate(list(target = DT$target), by = list(source = DT$source), 
         FUN = function(x) paste(x, sep = ", ")))
#'data.frame':   25 obs. of  2 variables:
# $ source: chr  "b" "c" "d" "e" ...
# $ target:List of 25
#  ..$ 01: chr  "U" "Q" "G" "C" ...
#  ..$ 02: chr  "D" "S" "G" "W"
#  ..$ 03: chr  "R" "U" "L"
#  ...
#  ...

如何使用data.table复制aggregate（）功能？

1 个答案:

更新