根据数据框中的自定义规则计算总和

时间:2018-10-31 05:06:51

标签: r dplyr data.table

最好在R中使用data.table:我想根据以下规则计算DIAMID的{​​{1}}的总和:

  1. 如果特定主题周期的CYCLE #中的任何一个表示为DIAM,则无法计算NE(必须返回SUM
  2. 如果任何NA表示为DIAM,则忽略NA来计算总和(即好像是0)
  3. 如果都不是NA,则按常规计算总和

我也想将NA的数字替换为CYCLE代表0的数字。

BASELINE

这需要应用于每个主题。那里有很多循环,但这只是一个例子。

2 个答案:

答案 0 :(得分:3)

这里是一种选择。按“ ID”和match索引的“ CYCLE”分组(如预期输出所示),如果“ DIAM”的NA更改为“ {{1}”,则将“ DIAM”值更改为any设置为“ NE”,然后通过获取“ DIAM”的summarisesum,同时确保如果所有值均为NA,则返回NA

library(tidyverse)
dfin %>% 
  group_by(ID, CYCLE = match(CYCLE, unique(CYCLE))-1) %>% 
  mutate(DIAM = as.numeric(replace(DIAM, any(DIAM== "NE"), NA))) %>%
  summarise(SUM = NA^all(is.na(DIAM)) * sum(DIAM, na.rm = TRUE))
# A tibble: 4 x 3
# Groups:   ID [?]
#     ID CYCLE   SUM
#  <int> <dbl> <dbl>
#1     1     0    12
#2     1     1     8
#3     1     2    NA
#4     1     3     6

或者在if/else步骤之后使用group_by条件

dfin %>%
  group_by(ID, CYCLE = match(CYCLE, unique(CYCLE))-1)  %>% 
  summarise(SUM = if("NE" %in% DIAM) NA else sum(as.numeric(DIAM), na.rm = TRUE))

或对data.table使用相同的逻辑

library(data.table)
setDT(dfin)[, .(SUM = if("NE" %in% DIAM) NA_real_ else 
   sum(as.numeric(DIAM), na.rm = TRUE)), .(ID, CYCLE = rleid(CYCLE)-1)]
#   ID CYCLE SUM
#1:  1     0  12
#2:  1     1   8
#3:  1     2  NA
#4:  1     3   6

数据

dfin <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), 
  CYCLE = c("BASELINE", 
 "BASELINE", "CYCLE 1", "CYCLE 1", "CYCLE 2", "CYCLE 2", "CYCLE 3", 
 "CYCLE 3"), NUM = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), DIAM = c("8", 
 "4", "6", "2", "6", "NE", "6", NA)), row.names = c(NA, -8L), 
 class = "data.frame")

答案 1 :(得分:1)

# Data created
dfin<-data.table("ID" = rep(x = 1,times = 8),"CYCLE" = c("BASELINE","BASELINE","CYCLE 1","CYCLE 1","CYCLE 2","CYCLE 2","CYCLE 3","CYCLE 3"),
                 "NUM" = rep(x = c(1,2),times = 4),"DIAM" = c(8,4,6,2,6,"NE",6,NA))

# CYCLE transformed
dfin[,CYCLE := as.numeric(ifelse(CYCLE == "BASELINE","0",
                     substr(x = CYCLE,start = 7,stop = 7)))]

# SUM computed
dfin2<-dfin[,.(SUM = if(CYCLE == 0){
  NA_real_
} else if("NE" %in% DIAM){
  NA_real_
} else {
  sum(as.numeric(DIAM),na.rm = T)
}),by = c("ID","CYCLE")]

# IDs with CYCLE = 0 present have SUM updated to NA 
dfin2[ID %in% ID[which(CYCLE == 0)],SUM := NA]

希望这会有所帮助!