Question

如果这个问题已经得到解答但我找不到，我很抱歉。

我在R中有一个表:(见下面从txt复制的示例，实际表中有更多数据和NA）我需要通过b列中的组来计算c，e和f列的均值和sd

对于所有单独的例子，我可以按组分别计算平均值和sd。

mean(c[b == 1], na.rm=TRUE) 
var(e[b == 2], na.rm=TRUE)

我还可以计算所有列的平均值和标准差，并生成一个包含结果的表

library(data.table)
new <- data.table(project2016)

wide <- setnames(new[, sapply(.SD, function(x) list(mean = round(mean(x), 3), sd = round(sd(x), 3))), by = b], c("b", sapply(names(new)[-1], paste0, c(".mean", ".SD"))))
wide

但是我不能只为所需的colums做这个并且按组分开。

提前，邻避

  "id" "a" "b" "c" "d"    "e"     "f"  "g"
    1   78  2   83  4   2.53    1.07    3
    2   72  2   117 4   2.50    1.16    2
    3   72  2   132 4   2.43    1.13    2
    4   73  2   102 4   2.48    .81     2
    5   73  2   114 4   2.33    1.13    2
    6   73  2   88  43  2.13    .84     2
    7   65  2   213 4   2.55    1.26    1
    8   68  2   153 4   2.45    1.23    1

Answer 1

library(dplyr)


# Some reproducible data

d <- matrix(c(1, 78, 2, 83, 4, 2.53, 1.07, 3, 2, 72, 2, 117, 4, 2.50, 1.16, 2, 3, 72, 2, 132, 4, 2.43, 1.13, 2, 4, 73, 2, 102, 4, 2.48, .81, 2, 5, 73, 2, 114, 4, 2.33, 1.13, 2, 6, 73, 2, 88, 43, 2.13, .84, 2, 7, 65, 2, 213, 4, 2.55, 1.26, 1, 8, 68, 2, 153, 4, 2.45, 1.23, 1),
      ncol = 8, byrow = TRUE) %>% 
        as.data.frame

names(d) <- c("id", "a", "b", "c", "d", "e", "f", "g")

# Your data only included one group in column b
d$b[5:8] <- 1

# Calc mean and sd for the 3 columns, grouped by b
d %>%
    group_by(b) %>%
    summarise(mean_c = mean(c), sd_c = sd(c), 
                        mean_e = mean(e), sd_e = sd(e), 
                        mean_f = mean(f), sd_f = sd(f))

d

这会产生

# A tibble: 2 × 7
      b mean_c     sd_c mean_e       sd_e mean_f      sd_f
  <dbl>  <dbl>    <dbl>  <dbl>      <dbl>  <dbl>     <dbl>
1     1  142.0 54.35071  2.365 0.18064699 1.1150 0.1915724
2     2  108.5 20.95233  2.485 0.04203173 1.0425 0.1594522

还有非dplyr方法。

R：从组的某些列生成均值和SD表

1 个答案: