我有计算Kaplan-Meier产品的代码。
km_mean <- function(x,nd) {
library(tidyverse)
# first remove any missing data
df <- tibble(x,nd) %>% filter(!is.na(x))
x <- df %>% pull(x); nd <- df %>% pull(nd)
# handle cases of all detects or all nondetects; in these situations, no Kaplan-Meier
# estimate is possible or necessary; instead treat all detects as actual concentration estimates
# and all NDs as imputed at half their reporting limits
if (all(nd==0)) return(tibble(mean=mean(x),sd=sd(x)))
if (all(nd==1)) return(tibble(mean=mean(x/2),sd=sd(x/2)))
# for cases with mixed detects and NDs, table by nd status;
# determine unique x values; first subtract epsilon to each nondetect to associate
# larger rank for detects tied with NDs with same reporting limits
eps <- 1e-6
x <- x - nd*eps
nn <- nlevels(factor(x))
# determine number of at-risk values; build kaplan-meier CDF and survival function;
# note: need to augment and adjust <tab> for calculation below to work correctly
km.lev <- as.numeric(levels(factor(x)))
xa <- c(x,max(x)+1); nda <- c(nd,0)
tab <- table(xa,nda)
tab[nn+1,1] <- 0
km.rsk <- cumsum(tab[,1] + tab[,2])
km.cdf <- rev(cumprod(1 - rev(tab[,1])/rev(km.rsk)))[-1]
names(km.cdf) <- as.character(km.lev)
km.surv <- 1 - km.cdf
km.out <- tibble(km.lev,km.rsk=km.rsk[-length(km.rsk)],km.cdf,km.surv)
row.names(km.out) <- NULL
# estimate adjusted mean and SD
xm <- km.lev[1] + sum(diff(km.lev)*km.surv[-length(km.surv)])
dif <- diff(c(0,km.cdf))
xsd <- sqrt(sum(dif*(km.lev - xm)^2))
names(xm) <- NULL; names(xsd) <- NULL
tibble(mean=xm,sd=xsd)
}
我的数据有三列,即样品ID,值(x)和检测/未检测标志(nd)。
a1 0.23 0
a1 2.3 0
a1 1.6 0
a2 3.0 1
a2 3.1 0
a2 2.76 0
我该如何调整该功能以在所有a1样本上,然后在a2等上同时运行?
我尝试了group_by命令,但似乎无法突破。