为什么 dplyr::mutate 函数给出错误答案?

时间:2021-01-26 06:22:17

标签: r dplyr mutate

我不知道我的代码有什么问题。我想创建一个包含 total/sum(total 列的列。但是,它只包含每一行的值 1。 我写了这段代码:

eth_3 <- pop %>% 
  filter(ETHNICITY == 2) %>% 
  group_by(PROVINCE, ETHNICITY) %>% 
  summarise(total = sum(WEIGHT)) %>% 
  select(PROVINCE, total) %>% 
  mutate(S = total/sum(total))

得到这个结果

PROVINCE   total     S
     <int>   <dbl> <dbl>
1       11 93925.      1
2       12  2016.      1
3       13    40       1
4       14   255.      1
5       16    10       1
6       18    58.3     1

输出必须是:

   PROVINCE    total         S
      <int>    <dbl>     <dbl>
 1       11 93925.   0.968    
 2       12  2016.   0.0208   
 3       13    40    0.000412 
 4       14   255.   0.00263  
 5       16    10    0.000103 
 6       18    58.3  0.000601 
 7       19     9.67 0.0000997
 8       21    50.3  0.000519 
 9       31    34.7  0.000358 
10       32   142.   0.00147 

这是dput

structure(list(PROVINCE = c(11L, 12L, 13L, 14L, 16L, 18L, 19L, 
21L, 31L, 32L, 33L, 34L, 35L, 36L, 52L, 62L, 63L, 64L, 74L, 81L, 
91L), total = c(93925.4300413131, 2015.98999500274, 40, 255.349998474121, 
10, 58.3199987411499, 9.6700000762939, 50.340000152588, 34.6899995803834, 
142.189999580384, 30.0199995040892, 48.5600004196165, 160.789996147154, 
60.8100004196172, 9.8800001144409, 52.199997901915, 21.60000038147, 
19.7199993133544, 10.130000114441, 9.8999996185303, 28.1400003433227
), S = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1)), row.names = c(NA, -21L), groups = structure(list(PROVINCE = c(11L, 
12L, 13L, 14L, 16L, 18L, 19L, 21L, 31L, 32L, 33L, 34L, 35L, 36L, 
52L, 62L, 63L, 64L, 74L, 81L, 91L), .rows = structure(list(1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
    15L, 16L, 17L, 18L, 19L, 20L, 21L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 21L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

1 个答案:

答案 0 :(得分:3)

summarise 默认取消对最后一层的分组。因此,在 summarise 之后,您的数据仍按 PROVINCE 分组。在计算比例之前,您应该ungroup

library(dplyr)

eth_3  <- pop %>% 
           filter(ETHNICITY == 2) %>% 
           group_by(PROVINCE, ETHNICITY) %>% 
           summarise(total = sum(WEIGHT)) %>% 
           select(PROVINCE, total) %>%
           ungroup %>%
           mutate(S = total/sum(total))
           #mutate(S = prop.table(total))

如果您有 dplyr > 1.0.0,您可以指定 .groups = 'drop' 而不是使用 ungroup

pop %>% 
  filter(ETHNICITY == 2) %>% 
  group_by(PROVINCE, ETHNICITY) %>% 
  summarise(total = sum(WEIGHT), .groups = 'drop') %>% 
  select(PROVINCE, total) %>%
  mutate(S = total/sum(total))
相关问题