我正在尝试从数据帧中的组总数中计算每条记录的分数(%)。我的数据如下:
在这里我要考虑站,月和PHylum的因素,然后是总计。我想将总数显示为相对百分比,因此基本上可以按站和月份对总数求和,然后应用原始表。
在R中,我达到了:
bn_phyla %>%
group_by(Station, Month) %>%
summarise(total=sum(SumOfTotal Caught)) %>%
mutate(prop=SumOfTotal Caught/total)
哪个让我得到了组的总数,但是然后如何将其重新划分为原始数据并保留Phylum列?
谢谢
PS ..除了图像之外,Stackoverflow不能插入表吗?
答案 0 :(得分:1)
您可以不进行汇总而按预期操作。我将您的数据示例加倍,因此我有2个小组来展示其工作原理。
library(dplyr)
bn_phyla %>%
group_by(Station, Month) %>%
mutate(prop = SumOfTotal_Caught/sum(SumOfTotal_Caught))
# A tibble: 8 x 5
# Groups: Station, Month [2]
Station Month Phylum SumOfTotal_Caught prop
<chr> <chr> <chr> <dbl> <dbl>
1 A Feb-18 Annelida 20 0.182
2 A Feb-18 Arthropoda 20 0.182
3 A Feb-18 Mollusca 30 0.273
4 A Feb-18 Nemertea 40 0.364
5 B Mar-18 Annelida 40 0.333
6 B Mar-18 Arthropoda 30 0.25
7 B Mar-18 Mollusca 30 0.25
8 B Mar-18 Nemertea 20 0.167
数据:
# data_frame comes from dplyr
bn_phyla <- data_frame(Station = c(rep("A", 4), rep("B", 4)),
Month = c(rep("Feb-18", 4), rep("Mar-18", 4)),
Phylum = c("Annelida", "Arthropoda", "Mollusca", "Nemertea", "Annelida", "Arthropoda", "Mollusca", "Nemertea"),
SumOfTotal_Caught = c(20,20,30,40, 40,30,30,20))