我正在尝试从R中的现有data.table创建摘要数据表。 为此目的而不是循环遍历所有变量,我试图利用R中的data.table“:=”函数
raw_df <- cbind.data.frame(1:4,7:10,15:18)
names(raw_df) <- c('A','B','C')
raw_df$sample <- 1
我想要的是新data.table中每列的均值和分位数。下面是我正在使用的代码,但这只更新现有的表。 如何在不编写代码中的每个变量名的情况下创建新的data.table。
setDT(raw_df)[, c("Unique_Count","Mean",paste("Quantile_",seq(0,1,0.05),sep = ""))
:= lapply(.SD, function(x) c(length(unique(x)),mean(x,na.rm = T),quantile(x,probs = seq(0,1,0.05),na.rm = T))),
by = sample]
感谢您的期待!
答案 0 :(得分:3)
可能的解决方案:
melt(setDT(raw_df), id = 'sample')[, {q <- quantile(value, probs = seq(0,1,0.05), na.rm = TRUE);
c(unique_count = uniqueN(value), means = mean(value, na.rm = TRUE), as.list(q))}
, by = .(sample, variable)]
给出:
sample variable unique_count means 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% 1: 1 A 4 2.5 1 1.15 1.3 1.45 1.6 1.75 1.9 2.05 2.2 2.35 2.5 2.65 2.8 2.95 3.1 3.25 3.4 3.55 3.7 3.85 4 2: 1 B 4 8.5 7 7.15 7.3 7.45 7.6 7.75 7.9 8.05 8.2 8.35 8.5 8.65 8.8 8.95 9.1 9.25 9.4 9.55 9.7 9.85 10 3: 1 C 4 16.5 15 15.15 15.3 15.45 15.6 15.75 15.9 16.05 16.2 16.35 16.5 16.65 16.8 16.95 17.1 17.25 17.4 17.55 17.7 17.85 18
注意:最好不要在列名中使用 %
字符。为防止这种情况,您可以使用:
melt(setDT(raw_df), id = 'sample')[, {q <- quantile(value, probs = seq(0,1,0.05), na.rm = TRUE);
names(q) <- paste0('p_', gsub('%','',names(q)));
c(unique_count = uniqueN(value), means = mean(value, na.rm = TRUE), as.list(q))}
, by = .(sample, variable)]
给出:
sample variable unique_count means p_0 p_5 p_10 p_15 p_20 p_25 p_30 p_35 p_40 p_45 p_50 p_55 p_60 p_65 p_70 p_75 p_80 p_85 p_90 p_95 p_100 1: 1 A 4 2.5 1 1.15 1.3 1.45 1.6 1.75 1.9 2.05 2.2 2.35 2.5 2.65 2.8 2.95 3.1 3.25 3.4 3.55 3.7 3.85 4 2: 1 B 4 8.5 7 7.15 7.3 7.45 7.6 7.75 7.9 8.05 8.2 8.35 8.5 8.65 8.8 8.95 9.1 9.25 9.4 9.55 9.7 9.85 10 3: 1 C 4 16.5 15 15.15 15.3 15.45 15.6 15.75 15.9 16.05 16.2 16.35 16.5 16.65 16.8 16.95 17.1 17.25 17.4 17.55 17.7 17.85 18