如果这个问题已经得到解答但我找不到,我很抱歉。
我在R中有一个表:(见下面从txt复制的示例,实际表中有更多数据和NA) 我需要通过b列中的组来计算c,e和f列的均值和sd
对于所有单独的例子,我可以按组分别计算平均值和sd。
mean(c[b == 1], na.rm=TRUE)
var(e[b == 2], na.rm=TRUE)
我还可以计算所有列的平均值和标准差,并生成一个包含结果的表
library(data.table)
new <- data.table(project2016)
wide <- setnames(new[, sapply(.SD, function(x) list(mean = round(mean(x), 3), sd = round(sd(x), 3))), by = b], c("b", sapply(names(new)[-1], paste0, c(".mean", ".SD"))))
wide
但是我不能只为所需的colums做这个并且按组分开。
提前, 邻避
"id" "a" "b" "c" "d" "e" "f" "g"
1 78 2 83 4 2.53 1.07 3
2 72 2 117 4 2.50 1.16 2
3 72 2 132 4 2.43 1.13 2
4 73 2 102 4 2.48 .81 2
5 73 2 114 4 2.33 1.13 2
6 73 2 88 43 2.13 .84 2
7 65 2 213 4 2.55 1.26 1
8 68 2 153 4 2.45 1.23 1
答案 0 :(得分:0)
library(dplyr)
# Some reproducible data
d <- matrix(c(1, 78, 2, 83, 4, 2.53, 1.07, 3, 2, 72, 2, 117, 4, 2.50, 1.16, 2, 3, 72, 2, 132, 4, 2.43, 1.13, 2, 4, 73, 2, 102, 4, 2.48, .81, 2, 5, 73, 2, 114, 4, 2.33, 1.13, 2, 6, 73, 2, 88, 43, 2.13, .84, 2, 7, 65, 2, 213, 4, 2.55, 1.26, 1, 8, 68, 2, 153, 4, 2.45, 1.23, 1),
ncol = 8, byrow = TRUE) %>%
as.data.frame
names(d) <- c("id", "a", "b", "c", "d", "e", "f", "g")
# Your data only included one group in column b
d$b[5:8] <- 1
# Calc mean and sd for the 3 columns, grouped by b
d %>%
group_by(b) %>%
summarise(mean_c = mean(c), sd_c = sd(c),
mean_e = mean(e), sd_e = sd(e),
mean_f = mean(f), sd_f = sd(f))
d
这会产生
# A tibble: 2 × 7
b mean_c sd_c mean_e sd_e mean_f sd_f
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 142.0 54.35071 2.365 0.18064699 1.1150 0.1915724
2 2 108.5 20.95233 2.485 0.04203173 1.0425 0.1594522
还有非dplyr
方法。