我尝试在此特定方案中使用aggregate()
语法复制data.table
基本函数:
# make it reproducible
set.seed(16)
# create data.table
DT <- data.table(source = sample(letters, 100, replace = TRUE), target = sample(LETTERS, 100, replace = TRUE))
# source target
# 1: j J
# 2: d K
# 3: w L
# 4: g J
# ...
# aggregate using base function
aggregate(list(target = DT$target), by = list(source = DT$source), FUN = function(x) paste(x, sep = ", "))
# source target
#1 a L, W, S, W
#2 b V, H, R, J, G, W, N
#3 c Y, C, I, K
#4 d K, A, P, V
# ...
我尝试了一些使用data.table语法的东西,但我没有让它工作:
DT[, .(target = paste(target, sep = ", ")), by = source]
# source target
# 1: r P
# 2: r I
# 3: r Y
# 4: r G
# ...
DT[, target := paste(target, sep = ", "), by = source]
# source target
# 1: r P
# 2: g C
# 3: l U
# 4: f J
# ...
正确的方法是什么?
加分:删除输出中的重复LETTERS
(即:第1行应为L, W, S
,而不是L, W, S, W
)
谢谢!
答案 0 :(得分:1)
如果我们需要在&#39; target&#39;中获取所有元素的单个字符串。对于每个来源&#39;,请在collapse
中使用paste
参数。这可以使用toString
(paste(..., collapse=", ")
)
DT[, .(target = toString(target)), by = source]
我们还可以拥有paste
列
list
成一个字符串
DT[, .(target = list(target)), by = source]
与OP的帖子中的aggregate
输出类似(尽管意图似乎与paste
不同)
如果我们只需要唯一元素,请使用unique
DT[, .(target = toString(unique(target))), by = source]
DT[, .(target = list(unique(target))), by = source]
另外,如果我们需要sort
,请用sort
DT[, .(target = toString(sort(unique(target)))), by = source]
在OP的aggregate
代码中,sep
并未将字符串折叠为单个字符串,而是我们得到的是list
列
str(aggregate(list(target = DT$target), by = list(source = DT$source),
FUN = function(x) paste(x, sep = ", ")))
#'data.frame': 25 obs. of 2 variables:
# $ source: chr "b" "c" "d" "e" ...
# $ target:List of 25
# ..$ 01: chr "U" "Q" "G" "C" ...
# ..$ 02: chr "D" "S" "G" "W"
# ..$ 03: chr "R" "U" "L"
# ...
# ...