如何使用data.table复制aggregate()功能?

时间:2017-04-20 11:50:17

标签: r data.table aggregate

我尝试在此特定方案中使用aggregate()语法复制data.table基本函数:

# make it reproducible
set.seed(16)

# create data.table
DT <- data.table(source = sample(letters, 100, replace = TRUE), target = sample(LETTERS, 100, replace = TRUE))
#     source target
#  1:      j      J
#  2:      d      K
#  3:      w      L
#  4:      g      J
#  ...

# aggregate using base function
aggregate(list(target = DT$target), by = list(source = DT$source), FUN = function(x) paste(x, sep = ", "))
#   source              target
#1       a          L, W, S, W
#2       b V, H, R, J, G, W, N
#3       c          Y, C, I, K
#4       d          K, A, P, V
# ...

我尝试了一些使用data.table语法的东西,但我没有让它工作:

DT[, .(target = paste(target, sep = ", ")), by = source]
#     source target
#  1:      r      P
#  2:      r      I
#  3:      r      Y
#  4:      r      G
#  ...

DT[, target := paste(target, sep = ", "), by = source]
#     source target
#  1:      r      P
#  2:      g      C
#  3:      l      U
#  4:      f      J
#  ...

正确的方法是什么?

加分:删除输出中的重复LETTERS(即:第1行应为L, W, S,而不是L, W, S, W

谢谢!

1 个答案:

答案 0 :(得分:1)

如果我们需要在&#39; target&#39;中获取所有元素的单个字符串。对于每个来源&#39;,请在collapse中使用paste参数。这可以使用toStringpaste(..., collapse=", ")

更紧凑地编写
DT[, .(target = toString(target)), by = source]

我们还可以拥有paste

,而不是list成一个字符串

DT[, .(target = list(target)), by = source]

与OP的帖子中的aggregate输出类似(尽管意图似乎与paste不同)

更新

如果我们只需要唯一元素,请使用unique

DT[, .(target = toString(unique(target))), by = source]

DT[, .(target = list(unique(target))), by = source]

另外,如果我们需要sort,请用sort

换行
DT[, .(target = toString(sort(unique(target)))), by = source]

在OP的aggregate代码中,sep并未将字符串折叠为单个字符串,而是我们得到的是list

str(aggregate(list(target = DT$target), by = list(source = DT$source), 
         FUN = function(x) paste(x, sep = ", ")))
#'data.frame':   25 obs. of  2 variables:
# $ source: chr  "b" "c" "d" "e" ...
# $ target:List of 25
#  ..$ 01: chr  "U" "Q" "G" "C" ...
#  ..$ 02: chr  "D" "S" "G" "W"
#  ..$ 03: chr  "R" "U" "L"
#  ...
#  ...