根据一个因子的水平(带有总计列),编写一个具有汇总平均值的汇总表

时间:2019-03-24 09:08:30

标签: r dplyr

我希望代码创建一个汇总表,其中基于两个条件(即因子变量的级别)计算多个均值。这些级别分别位于相应的列中,但我想将它们切成表格的自己的列,并创建一个总计列(即两个级别的平均值之和)。我有以下示例代码:

我想使用一张表来进行降价的整洁数据摘要,并可能转换为word。

Depth<-c('0',   '0.1-2.0',  '2.1-10.0', '10.1-20.0',    '20.1- 
50.0',  '50.1-100.0',   '0', '0.1-2.0', '2.1-10.0', '10.1-20.0',     
'20.1-50.0',    '50.1-100.0')
Tag<-   c('Tag.1',  'Tag.1',    'Tag.1',    'Tag.1',     
'Tag.1',    'Tag.1',    'Tag.2',    'Tag.2',    'Tag.2',     
'Tag.2',    'Tag.2',    'Tag.2')
Proportion<-    c(2.287356322,  5.896551724,    9.528735632,     
7.229885057,    73.54022989,    1.517241379,    0.5,    86.3,   13.2,    
0.1,    0.1,    0.1)
Season<-    c('Autumn', 'Autumn',   'Autumn',   'Autumn',    
'Autumn',   'Autumn',   'Summer',   'Summer',   'Summer',    
'Summer',   'Summer',   'Summer')

df<-data.frame(Depth, Tag, Proportion, Season)

我可以创建以下表格:

library(knitr)
df$Proportion<-as.numeric(df$Proportion)
df$Depth<-as.factor(df$Depth)

tt1<-df%>%
  group_by(Season, Depth)%>%
  summarise(Mean=mean(Proportion))

kable(tt1)


|Season |Depth      |      Mean|
|:------|:----------|---------:|
|Autumn |0          |  2.287356|
|Autumn |0.1-2.0    |  5.896552|
|Autumn |10.1-20.0  |  7.229885|
|Autumn |2.1-10.0   |  9.528736|
|Autumn |20.1-50.0  | 73.540230|
|Autumn |50.1-100.0 |  1.517241|
|Summer |0          |  0.500000|
|Summer |0.1-2.0    | 86.300000|
|Summer |10.1-20.0  |  0.100000|
|Summer |2.1-10.0   | 13.200000|
|Summer |20.1-50.0  |  0.100000|
|Summer |50.1-100.0 |  0.100000|

但是进一步的总结将使读者受益(即表中只有四列:1深度,2 MeanAut,3 MeanSum和4总计)

我尝试过:

ttt1<-df%>%
  group_by(Depth)%>%
  mutate(meanAut=case_when(Season=='Autumn' ~ 
 summarise(mean(Proportion))))%>%
    mutate(meanSum=case_when(Season=='Summer' ~ 
summarise(mean(Proportion))))%>%
 bind_rows(summarise_all(., funs(if(is.numeric(.)) sum(.) else "Total")))

但是出现错误: mutate_impl(.data,点)中的错误:评估错误:'summarise_'的适用方法不适用于类“ c('double','numeric')”的对象。

预期输出:

Depth       meanAut meanSum Total
0           2.2     NA      2.2
0.1-2.0     5.8     86.3    46.05
10.1-20.0   7.2     0.1     3.65
2.1-10.0    9.5     13.2    11.35
20.1-50.0   73.5    0.1     36.8
50.1-100.0  1.5     0.1     0.8

任何有关如何格式化表格的建议将不胜感激!

1 个答案:

答案 0 :(得分:0)

一种tidyverse可能是:

df %>%
 group_by(Depth, Season) %>%
 summarise(mean_season = mean(Proportion, na.rm = TRUE)) %>%
 mutate(Season = paste("Mean", Season, sep = "_")) %>%
 spread(Season, mean_season)  %>%
 left_join(df %>%
 group_by(Depth) %>%
 summarise(Mean_Total = mean(Proportion, na.rm = TRUE)),
 by = c("Depth" = "Depth"))

  Depth      Mean_Autumn Mean_Summer Mean_Total
  <fct>            <dbl>       <dbl>      <dbl>
1 0                 2.29         0.5      1.39 
2 0.1-2.0           5.90        86.3     46.1  
3 10.1-20.0         7.23         0.1      3.66 
4 2.1-10.0          9.53        13.2     11.4  
5 20.1-50.0        73.5          0.1     36.8  
6 50.1-100.0        1.52         0.1      0.809

在这里,首先,计算每个深度和季节的平均值。其次,它创建新的变量名称,其中包含“均值”。第三,它将新的变量名分成列,均值作为值。第四,它计算每个深度的总体平均值。最后,它结合了总体和季节性手段,在“深度”上将两者结合在一起。

并在kable()中添加knitr

df %>%
 group_by(Depth, Season) %>%
 summarise(mean_season = mean(Proportion, na.rm = TRUE)) %>%
 mutate(Season = paste("Mean", Season, sep = "_")) %>%
 spread(Season, mean_season)  %>%
 left_join(df %>%
 group_by(Depth) %>%
 summarise(Mean_Total = mean(Proportion, na.rm = TRUE)),
 by = c("Depth" = "Depth")) %>%
 kable()

|Depth      | Mean_Autumn| Mean_Summer| Mean_Total|
|:----------|-----------:|-----------:|----------:|
|0          |    2.287356|         0.5|  1.3936782|
|0.1-2.0    |    5.896552|        86.3| 46.0982759|
|10.1-20.0  |    7.229885|         0.1|  3.6649425|
|2.1-10.0   |    9.528736|        13.2| 11.3643678|
|20.1-50.0  |   73.540230|         0.1| 36.8201149|
|50.1-100.0 |    1.517241|         0.1|  0.8086207|