根据2列中的条件汇总所有列

时间:2019-11-16 13:01:56

标签: dplyr

这是我的输入数据:

structure(list(exp_sal = c(1, 1, NA, NA), curr_sal = c(1, NA, 
1, NA), `1` = c(59L, 33L, 237L, 244L), `2` = c(98L, 199L, 127L, 
178L), `3` = c(75L, 283L, 53L, 141L), `4` = c(26L, 151L, 23L, 
111L), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L), 
    `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(NA, 
-4L), class = "data.frame")

我希望每列都有基于条件的摘要计数: 如果exp_sal不是NA,则将各列相加 如果curr_sal不是NA,则将各列加起来

结果:

我想为exp_sal总结第1行和第3行,为curr_sal和总结第1行和第3行 第4行完全掉线。

我想要的结果

result <- structure(list(exp_sal = c(1, NA), curr_sal = c(NA, 1), 
                     `1` = c(97L, 296L), `2` = c(297L, 225L), 
                     `3` = c(358L, 128L), `4` = c(177L, 49L), 
                     `5` = c(85L, 28L), `6` = c(61L, 9L), 
                     `7` = c(31L, 2L), `8` = c(53L, 79L)), 
                     row.names = c(NA, -2L), class = "data.frame")

我已经看过这个答案

Sum Values of Every Column in Data Frame with Conditional For Loop

但是我不知道是否应该使用mutate和summarise_at

或summarise_if或case_when

很抱歉发布这样的基本问题-我们将不胜感激任何帮助或建议。

1 个答案:

答案 0 :(得分:1)

您的数据混乱。我建议重塑它以便于聚合。一种方法是这样的:(代码中的注释)

mydf <- structure(list(  exp_sal = c(1, 1, NA, NA), curr_sal = c(    1, NA,    1, NA  ), `1` = c(59L, 33L, 237L, 244L), `2` = c(    98L, 199L, 127L,    178L  ), `3` = c(75L, 283L, 53L, 141L), `4` = c(    26L, 151L, 23L,    111L  ), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L),  `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(  NA,  -4L), class = "data.frame")

library(tidyverse) #also to load tidyr

mydf %>% gather(key, value, -exp_sal,-curr_sal) %>% # crucial step to make data long
  mutate(curr_val = ifelse(curr_sal == 1,value,NA),
         exp_val = ifelse(exp_sal == 1,value,NA)) %>% #this step actually cleans up the data and assigns a value to each new column for 'exp' and 'curr'
  group_by(key) %>% #for your summary, because you want to sum up your previous rows which are now assigned a key in a new column
  summarise_at( .vars = vars(curr_val, exp_val), .funs = sum, na.rm = TRUE)
#> # A tibble: 8 x 3
#>   key   curr_val exp_val
#>   <chr>    <int>   <int>
#> 1 1          296      92
#> 2 2          225     297
#> 3 3          128     358
#> 4 4           49     177
#> 5 5           28      85
#> 6 6            9      61
#> 7 7            2      31
#> 8 8           79      53

reprex package(v0.2.1)于2019-11-17创建

您可以通过卸下管道来查看每个中间步骤。 如果您确实需要呈示结果形式的数据,请尝试t() 但是,老实说,我认为这对进一步分析没有帮助。