我有一个包含许多列的数据框,其中一些是度量变量。我想使用data.table从后者中提取一堆摘要统计信息。我的问题如下:如何根据使用的函数来重命名聚合的列?
我想要一个聚合的data.table,其列名如下:c("measure1_mean", "measure1_sd", "measure2_mean", "measure2_sd", ...)
我的代码如下:
library(data.table)
library(stringr)
dt <- data.table(meas1=1:10,
meas2=seq(5,25, length.out = 10),
meas3=rnorm(10),
groupvar=rep(LETTERS[1:5], each=2))
measure_cols <- colnames(dt)[str_detect(colnames(dt), "^meas")]
dt_agg <- dt[, c(lapply(.SD, mean),
lapply(.SD, sd)),
by=groupvar, .SDcols = measure_cols]
# Does not work because of duplicates in rep(measure_cols, 3)
agg_names <- c(measure_cols, paste(rep(c("mean", "sd"), each=length(measure_cols)), measure_cols, sep="_"))
setnames(dt_agg, rep(measure_cols,3), agg_names)
此块有效地提取统计信息,但返回具有相同名称的列。因此,我不能使用类似setnames(dt, old, new)
之类的东西,因为我的“旧”向量中存在重复项。
我碰到了这篇文章:Rename aggregated columns using data.table in R。但是我不喜欢公认的解决方案,因为它依赖列索引而不是名称来重命名列。
答案 0 :(得分:0)
library(data.table)
dt <- data.table(meas1=1:10,
meas2=seq(5,25, length.out = 10),
meas3=rnorm(10),
groupvar=rep(LETTERS[1:5], each=2))
measure_cols <- colnames(dt)[str_detect(colnames(dt), "^meas")]
dt_agg <- dt[, c(lapply(.SD, mean),
lapply(.SD, sd)),
by=groupvar, .SDcols = measure_cols]
您可以创建一个带有名称的矢量...使用each
参数将函数名称粘贴在measure_cols后面。
function.names <- c("mean", "sd")
column.names <- paste0( measure_cols, "_", rep( function.names, each = length( measure_cols ) ) )
setnames( dt_agg, c("groupvar", column.names ))
# groupvar meas1_mean meas2_mean meas3_mean meas1_sd meas2_sd meas3_sd
# 1: A 1.5 6.111111 0.2346044 0.7071068 1.571348 1.6733804
# 2: B 3.5 10.555556 0.5144621 0.7071068 1.571348 0.0894364
# 3: C 5.5 15.000000 -0.5469839 0.7071068 1.571348 2.1689620
# 4: D 7.5 19.444444 -0.3898213 0.7071068 1.571348 1.0007027
# 5: E 9.5 23.888889 0.5569743 0.7071068 1.571348 1.4499413