从一大组数据帧中产生平均值、标准偏差和平均值的标准误差

时间:2021-04-16 00:45:56

标签: r

假设我有一个名为“Data”的数据框,如下所示:

View(Data)
Ball Day Expansion
Red  1   5
Red  1   8
Red  1   3
Red  2   7
Red  2   9
Blue 1   5
Blue 1   3
Blue 2   7
Blue 2   5
Blue 2   4
...

我想从这组数据中得到均值(SE)、标准差(SD)和均值的标准误差,使最终产品看起来像这样

#Note: 'Expansion' value shown is showing the mean of the group, 'x' and 'y' are the result of the SE and SD

Ball Day Expansion SE SD
Red  1    7        X  Y
Red  2    5        X  Y
Red  3    6        X  Y
Red  4    5        X  Y
Blue 1    4        X  Y
Blue 2    8        X  Y
Blue 3    6        X  Y
...

有人知道如何做到这一点吗?

2 个答案:

答案 0 :(得分:5)

我希望这就是你的想法:

library(dplyr)

df %>%
  group_by(Ball, Day) %>%
  summarise(across(Expansion, list(Mean = mean, 
                                SD = sd, 
                                SE = function(x) sqrt(var(x)/length(x))), 
                   .names = "{.fn}.{.col}"))

# A tibble: 4 x 5
# Groups:   Ball [2]
  Ball    Day Mean.Expansion SD.Expansion SE.Expansion
  <chr> <dbl>          <dbl>        <dbl>        <dbl>
1 Blue      1           4            1.41        1    
2 Blue      2           5.33         1.53        0.882
3 Red       1           5.33         2.52        1.45 
4 Red       2           8            1.41        1 

正如亲爱的@www 所建议的那样,summarise 函数的输出更简洁,但是,mutate 输出更接近您在问题中所拥有的:

# A tibble: 10 x 6
# Groups:   Ball, Day [4]
   Ball    Day Expansion Mean.Expansion SD.Expansion SE.Expansion
   <chr> <dbl>     <dbl>          <dbl>        <dbl>        <dbl>
 1 Red       1         5           5.33         2.52        1.45 
 2 Red       1         8           5.33         2.52        1.45 
 3 Red       1         3           5.33         2.52        1.45 
 4 Red       2         7           8            1.41        1    
 5 Red       2         9           8            1.41        1    
 6 Blue      1         5           4            1.41        1    
 7 Blue      1         3           4            1.41        1    
 8 Blue      2         7           5.33         1.53        0.882
 9 Blue      2         5           5.33         1.53        0.882
10 Blue      2         4           5.33         1.53        0.882

数据:

df <- tribble(
  ~Ball, ~Day, ~Expansion,
  "Red",  1,   5,
  "Red",  1,   8,
  "Red",  1,   3,
  "Red",  2,   7,
  "Red",  2,   9,
  "Blue", 1,   5,
  "Blue", 1,   3,
  "Blue", 2,   7,
  "Blue", 2,   5,
  "Blue", 2,   4
)

答案 1 :(得分:3)

这是一种方法。我们可以使用 dplyr 包进行此类计算

library(dplyr)

Data2 <- Data %>%
  group_by(Ball, Day) %>%
  summarize(Mean = mean(Expansion),
            SE = sd(Expansion)/sqrt(n()),
            SD = sd(Expansion)) %>%
  rename(Expansion = Mean) %>%
  ungroup() 

Data2
# # A tibble: 4 x 5
#   Ball    Day Expansion    SE    SD
#   <chr> <int>     <dbl> <dbl> <dbl>
# 1 Blue      1      4    1      1.41
# 2 Blue      2      5.33 0.882  1.53
# 3 Red       1      5.33 1.45   2.52
# 4 Red       2      8    1      1.41

数据

Data <- read.table(
  text = "Ball Day Expansion
Red  1   5
Red  1   8
Red  1   3
Red  2   7
Red  2   9
Blue 1   5
Blue 1   3
Blue 2   7
Blue 2   5
Blue 2   4", header = TRUE
)