计算堆积条形图的标准偏差

时间:2020-04-27 08:48:45

标签: r standard-deviation standard-error

我想计算标准偏差和标准误差,以便可以在堆积的条形图上显示误差线。

 Management    Habitat   Intensity     Var2   
   A           Urban        High        6   
   A          Farmland      High        9   
   A          Farmland      Medium     10 
   B          Forest        Medium     17 
   B          Peatland      Medium     23     
   C          Peatland      Low        22    
   C          Urban         Low        10     

我的堆积条形图代码为

 ggplot(df, aes(fill=Habitat, y= Var1, x=Intensity)) + 
  geom_bar(position="stack", stat="identity")+
  labs(y = "Area of habitat (hectares)")+
  theme(legend.title = element_text())

我尝试使用ddply函数通过强度计算Var 2的标准偏差和标准误差,从而通过强度给出每个条形的总误差,然后设置ymin和ymax的极限,但是我得到了误差

错误:美学的长度必须为1或与数据(96)相同:ymax和ymin

EB<-ddply(Mean_PFB, c("Intensity"), summarise,
      N    = length(Var2),
      mean = mean(Var2),
      sd   = sd(Var2),
      se   = sd / sqrt(N))

1 个答案:

答案 0 :(得分:0)

这是您的完整数据集吗?由于没有正确的复制,因此无法计算标准偏差或标准误差。见下文

library(tidyverse)
#> Warning: package 'tidyr' was built under R version 3.6.2
#> Warning: package 'dplyr' was built under R version 3.6.2

df <- read.table(text = "Management    Habitat   Intensity     Var2   
           A          Urban         High        6   
           A          Farmland      High        9   
           A          Farmland      Medium     10 
           B          Forest        Medium     17 
           B          Peatland      Medium     23     
           C          Peatland      Low        22    
           C          Urban         Low        10", header=T)

#standard deviation calculation
df %>% 
  group_by(Habitat) %>% 
  summarise(new = list(mean_sdl(Var2))) %>% 
  unnest(new)
#> # A tibble: 4 x 4
#>   Habitat      y   ymin  ymax
#>   <fct>    <dbl>  <dbl> <dbl>
#> 1 Farmland   9.5   8.09  10.9
#> 2 Forest    17   NaN    NaN  
#> 3 Peatland  22.5  21.1   23.9
#> 4 Urban      8     2.34  13.7

df %>% 
  group_by(Management) %>% 
  summarise(new = list(mean_sdl(Var2))) %>% 
  unnest(new)
#> # A tibble: 3 x 4
#>   Management     y   ymin  ymax
#>   <fct>      <dbl>  <dbl> <dbl>
#> 1 A           8.33  4.17   12.5
#> 2 B          20    11.5    28.5
#> 3 C          16    -0.971  33.0

df %>% 
  group_by(Intensity) %>% 
  summarise(new = list(mean_sdl(Var2))) %>% 
  unnest(new)
#> # A tibble: 3 x 4
#>   Intensity     y   ymin  ymax
#>   <fct>     <dbl>  <dbl> <dbl>
#> 1 High        7.5  3.26   11.7
#> 2 Low        16   -0.971  33.0
#> 3 Medium     16.7  3.65   29.7

#standard deviation calculation for grouped data with Intensity, Habitat 
#give you NAs as it does not have proper replications
df %>% 
  group_by(Intensity, Habitat) %>% 
  summarise(new = list(mean_sdl(Var2))) %>% 
  unnest(new)
#> # A tibble: 7 x 5
#> # Groups:   Intensity [3]
#>   Intensity Habitat      y  ymin  ymax
#>   <fct>     <fct>    <dbl> <dbl> <dbl>
#> 1 High      Farmland     9   NaN   NaN
#> 2 High      Urban        6   NaN   NaN
#> 3 Low       Peatland    22   NaN   NaN
#> 4 Low       Urban       10   NaN   NaN
#> 5 Medium    Farmland    10   NaN   NaN
#> 6 Medium    Forest      17   NaN   NaN
#> 7 Medium    Peatland    23   NaN   NaN

相同,适用于标准错误,只需使用mean_se代替mean_sdl

reprex package(v0.3.0)于2020-04-27创建