具有多个变量的多行的R总和

时间:2018-11-01 15:12:24

标签: r dplyr tidyverse

我有一些治疗方法(A, B, C, D等)有4种情况(z, y, x, v等),它们用于患者的时间总和(行)。

示例:

treatments = tibble(treatment = rep(c("A","B","AB"), 4), 
       condition = rep(c("z","y","x","v"),3), 
       n_times_used = 10:21) %>% 
  arrange (treatment)

有时还使用了联合治疗AB。 我想写一个函数:  1.检查当前数据集中是否存在组合治疗AB  2.如果是,我希望将AB号同时添加到“ A”和“ B”号中,但仅针对condition。添加后,应从数据集中删除AB

例如:上个月,我有100例接受Az治疗的患者(治疗A,病情z),150例Bz患者,40例{{1 }}和70名Cz患者。因此,我想要的汇总表中的数字是ABz

我试图构造类似的东西

Az = 170; Bz = 220, Cz = 40

与B + AB相同,然后过滤以从表格中删除AB。代码中仍然有错误...

更新1.处理treatments %>% {stopifnot(any(.$treatment == "AB", na.rm = T))} %>% group_by(condition) %>% mutate(n_times_used = if_else(treatment=="A", true = sum(n_times_used[which(.$treatment== "A")], n_times_used[which(.$treatment== "AB")]), false = n_times_used))

的示例

我添加了另一个示例,因为在第一个示例中,仅包含处理CA。如果我们有一种B治疗方法,我不需要将C添加到其中。

AB

更新2.缺少treatments_ABC = tibble(treatment = rep(c("A","B","AB","C"), 3), condition = rep(c("z","y","x"), 4), n_times_used = round(abs(rnorm(n = 12, mean = 10, sd = 30)))) %>% arrange (treatment) A治疗的示例

B

1 个答案:

答案 0 :(得分:0)

我们可以使用if/else条件

library(dplyr)
treatments %>% 
   group_by(condition) %>% 
   mutate(n_times_used = if("AB" %in% treatment) n_times_used + 
     n_times_used[treatment == "AB"] 
           else n_times_used) %>% 
   filter(treatment != "AB")

在这里,我们必须假设每个“条件”都有一个“ AB”(如示例所示)


如果我们在“治疗”中还有其他要素而不影响它们,那么我们将基于排除这些要素进行分配

treatments_ABC %>%
    group_by(condition) %>%
    mutate(n_times_used = ifelse(treatment %in% c("A", "B", "AB") & 
         "AB" %in% treatment, 
             n_times_used + n_times_used[treatment == "AB"], 
              n_times_used)) %>% 
    filter(treatment != "AB")

library(data.table)
setDT(treatments_ABC)[treatment %chin% c("A", "B", "AB"), 
   n_times_used := n_times_used + n_times_used[treatment == "AB"], by = condition]
treatments_ABC[treatment != "AB"]