在data.table中创建多个新的汇总列

时间:2018-02-23 07:47:44

标签: r data.table

我正在尝试从R中的现有data.table创建摘要数据表。 为此目的而不是循环遍历所有变量,我试图利用R中的data.table“:=”函数

raw_df <- cbind.data.frame(1:4,7:10,15:18)
names(raw_df) <- c('A','B','C')
raw_df$sample <- 1

我想要的是新data.table中每列的均值和分位数。下面是我正在使用的代码,但这只更新现有的表。 如何在不编写代码中的每个变量名的情况下创建新的data.table。

setDT(raw_df)[, c("Unique_Count","Mean",paste("Quantile_",seq(0,1,0.05),sep = "")) 
    := lapply(.SD, function(x) c(length(unique(x)),mean(x,na.rm = T),quantile(x,probs = seq(0,1,0.05),na.rm = T))), 
    by = sample]

感谢您的期待!

1 个答案:

答案 0 :(得分:3)

可能的解决方案:

melt(setDT(raw_df), id = 'sample')[, {q <- quantile(value, probs = seq(0,1,0.05), na.rm = TRUE);
                                      c(unique_count = uniqueN(value), means = mean(value, na.rm = TRUE), as.list(q))}
                                   , by = .(sample, variable)]

给出:

   sample variable unique_count means 0%    5%  10%   15%  20%   25%  30%   35%  40%   45%  50%   55%  60%   65%  70%   75%  80%   85%  90%   95% 100%
1:      1        A            4   2.5  1  1.15  1.3  1.45  1.6  1.75  1.9  2.05  2.2  2.35  2.5  2.65  2.8  2.95  3.1  3.25  3.4  3.55  3.7  3.85    4
2:      1        B            4   8.5  7  7.15  7.3  7.45  7.6  7.75  7.9  8.05  8.2  8.35  8.5  8.65  8.8  8.95  9.1  9.25  9.4  9.55  9.7  9.85   10
3:      1        C            4  16.5 15 15.15 15.3 15.45 15.6 15.75 15.9 16.05 16.2 16.35 16.5 16.65 16.8 16.95 17.1 17.25 17.4 17.55 17.7 17.85   18

注意:最好不要在列名中使用 % 字符。为防止这种情况,您可以使用:

melt(setDT(raw_df), id = 'sample')[, {q <- quantile(value, probs = seq(0,1,0.05), na.rm = TRUE);
                                      names(q) <- paste0('p_', gsub('%','',names(q)));
                                      c(unique_count = uniqueN(value), means = mean(value, na.rm = TRUE), as.list(q))}
                                   , by = .(sample, variable)]

给出:

   sample variable unique_count means p_0   p_5 p_10  p_15 p_20  p_25 p_30  p_35 p_40  p_45 p_50  p_55 p_60  p_65 p_70  p_75 p_80  p_85 p_90  p_95 p_100
1:      1        A            4   2.5   1  1.15  1.3  1.45  1.6  1.75  1.9  2.05  2.2  2.35  2.5  2.65  2.8  2.95  3.1  3.25  3.4  3.55  3.7  3.85     4
2:      1        B            4   8.5   7  7.15  7.3  7.45  7.6  7.75  7.9  8.05  8.2  8.35  8.5  8.65  8.8  8.95  9.1  9.25  9.4  9.55  9.7  9.85    10
3:      1        C            4  16.5  15 15.15 15.3 15.45 15.6 15.75 15.9 16.05 16.2 16.35 16.5 16.65 16.8 16.95 17.1 17.25 17.4 17.55 17.7 17.85    18