为data.table R保留lapply(.SD,...)中的列名

时间:2015-04-27 21:56:48

标签: r data.table

当将具有多个输出变量(例如,列表)的函数应用于data.table的子集时,我丢失了变量名称。有没有办法保留它们?

library(data.table)

foo <- function(x){
  list(mn = mean(x), sd = sd(x))
}

bar <- data.table(x=1:8, y=c("d","e","f","g"))

# column names "mn" and "sd" are replaced by "V1" and "V2"
bar[, sapply(.SD, foo), by = y, .SDcols="x"]

# column names "mn" and "sd" are retained
bar_split <- split(bar$x, bar$y)
t(sapply(bar_split, foo))

2 个答案:

答案 0 :(得分:11)

我会想到以下内容,这有点尴尬,但无论有多少功能,都不​​需要手动编写名称

set.seed(1)
bar[, z := sample(8)]
bar[, as.list(unlist(lapply(.SD, foo))), by = y, .SDcols = c("x", "z")]
#    y x.mn     x.sd z.mn      z.sd
# 1: d    3 2.828427  2.0 1.4142136
# 2: e    4 2.828427  7.5 0.7071068
# 3: f    5 2.828427  3.0 1.4142136
# 4: g    6 2.828427  5.5 0.7071068

这种方法的最大优点是它将函数与列名绑定在一起。例如,如果您有一个额外的列,它仍然会在使用与上面相同的代码时提供信息性的结果

colection.products_count

答案 1 :(得分:2)

setNames函数允许您添加缺少的字符向量。:

bar[, setNames( sapply(.SD, foo), c("mn", "sd")), by = y, .SDcols="x"]
   y mn       sd
1: d  3 2.828427
2: e  4 2.828427
3: f  5 2.828427
4: g  6 2.828427

作者建议使用Arenburg建议的另一种形式:

DT[, c('x2', 'y2') := list(x / sum(x), y / sum(y)), by = grp]