最好在R中使用data.table
:我想根据以下规则计算DIAM
,ID
的{{1}}的总和:
CYCLE #
中的任何一个表示为DIAM
,则无法计算NE
(必须返回SUM
)NA
表示为DIAM
,则忽略NA
来计算总和(即好像是0)NA
,则按常规计算总和我也想将NA
的数字替换为CYCLE
代表0的数字。
BASELINE
这需要应用于每个主题。那里有很多循环,但这只是一个例子。
答案 0 :(得分:3)
这里是一种选择。按“ ID”和match
索引的“ CYCLE”分组(如预期输出所示),如果“ DIAM”的NA
更改为“ {{1}”,则将“ DIAM”值更改为any
设置为“ NE”,然后通过获取“ DIAM”的summarise
来sum
,同时确保如果所有值均为NA
,则返回NA
library(tidyverse)
dfin %>%
group_by(ID, CYCLE = match(CYCLE, unique(CYCLE))-1) %>%
mutate(DIAM = as.numeric(replace(DIAM, any(DIAM== "NE"), NA))) %>%
summarise(SUM = NA^all(is.na(DIAM)) * sum(DIAM, na.rm = TRUE))
# A tibble: 4 x 3
# Groups: ID [?]
# ID CYCLE SUM
# <int> <dbl> <dbl>
#1 1 0 12
#2 1 1 8
#3 1 2 NA
#4 1 3 6
或者在if/else
步骤之后使用group_by
条件
dfin %>%
group_by(ID, CYCLE = match(CYCLE, unique(CYCLE))-1) %>%
summarise(SUM = if("NE" %in% DIAM) NA else sum(as.numeric(DIAM), na.rm = TRUE))
或对data.table
使用相同的逻辑
library(data.table)
setDT(dfin)[, .(SUM = if("NE" %in% DIAM) NA_real_ else
sum(as.numeric(DIAM), na.rm = TRUE)), .(ID, CYCLE = rleid(CYCLE)-1)]
# ID CYCLE SUM
#1: 1 0 12
#2: 1 1 8
#3: 1 2 NA
#4: 1 3 6
dfin <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
CYCLE = c("BASELINE",
"BASELINE", "CYCLE 1", "CYCLE 1", "CYCLE 2", "CYCLE 2", "CYCLE 3",
"CYCLE 3"), NUM = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), DIAM = c("8",
"4", "6", "2", "6", "NE", "6", NA)), row.names = c(NA, -8L),
class = "data.frame")
答案 1 :(得分:1)
# Data created
dfin<-data.table("ID" = rep(x = 1,times = 8),"CYCLE" = c("BASELINE","BASELINE","CYCLE 1","CYCLE 1","CYCLE 2","CYCLE 2","CYCLE 3","CYCLE 3"),
"NUM" = rep(x = c(1,2),times = 4),"DIAM" = c(8,4,6,2,6,"NE",6,NA))
# CYCLE transformed
dfin[,CYCLE := as.numeric(ifelse(CYCLE == "BASELINE","0",
substr(x = CYCLE,start = 7,stop = 7)))]
# SUM computed
dfin2<-dfin[,.(SUM = if(CYCLE == 0){
NA_real_
} else if("NE" %in% DIAM){
NA_real_
} else {
sum(as.numeric(DIAM),na.rm = T)
}),by = c("ID","CYCLE")]
# IDs with CYCLE = 0 present have SUM updated to NA
dfin2[ID %in% ID[which(CYCLE == 0)],SUM := NA]
希望这会有所帮助!