如何通过数据组循环功能?

时间:2019-10-09 22:52:21

标签: r

我有计算Kaplan-Meier产品的代码。

km_mean <- function(x,nd) {
  library(tidyverse)
  
# first remove any missing data
  df <- tibble(x,nd) %>% filter(!is.na(x))
	x <- df %>% pull(x); nd <- df %>% pull(nd)
# handle cases of all detects or all nondetects; in these situations, no Kaplan-Meier
# estimate is possible or necessary; instead treat all detects as actual concentration estimates
# and all NDs as imputed at half their reporting limits
	if (all(nd==0)) return(tibble(mean=mean(x),sd=sd(x)))
	if (all(nd==1)) return(tibble(mean=mean(x/2),sd=sd(x/2)))
# for cases with mixed detects and NDs, table by nd status;
# determine unique x values; first subtract epsilon to each nondetect to associate
# larger rank for detects tied with NDs with same reporting limits
	eps <- 1e-6
	x <- x - nd*eps
	nn <- nlevels(factor(x))
# determine number of at-risk values; build kaplan-meier CDF and survival function;
# note: need to augment and adjust <tab> for calculation below to work correctly
	km.lev <- as.numeric(levels(factor(x)))
	xa <- c(x,max(x)+1); nda <- c(nd,0)
	tab <- table(xa,nda)
	tab[nn+1,1] <- 0
	km.rsk <- cumsum(tab[,1] + tab[,2])
	km.cdf <- rev(cumprod(1 - rev(tab[,1])/rev(km.rsk)))[-1]
	names(km.cdf) <- as.character(km.lev)
	km.surv <- 1 - km.cdf

  km.out <- tibble(km.lev,km.rsk=km.rsk[-length(km.rsk)],km.cdf,km.surv)
  row.names(km.out) <- NULL

# estimate adjusted mean and SD
	xm <- km.lev[1] + sum(diff(km.lev)*km.surv[-length(km.surv)])
  dif <- diff(c(0,km.cdf))
	xsd <- sqrt(sum(dif*(km.lev - xm)^2))
	names(xm) <- NULL; names(xsd) <- NULL
	tibble(mean=xm,sd=xsd)
}

我的数据有三列,即样品ID,值(x)和检测/未检测标志(nd)。

a1  0.23 0
a1  2.3 0
a1  1.6 0
a2  3.0 1
a2  3.1 0
a2  2.76  0

我该如何调整该功能以在所有a1样本上,然后在a2等上同时运行?

我尝试了group_by命令,但似乎无法突破。

0 个答案:

没有答案