任何人都知道如何利用ggplot或格子进行生存分析?做格子或类似小平面的生存图会很好。
所以最后我玩了一下,找到了一个Kaplan-Meier情节的解决方案。我为将列表元素放入数据帧中的混乱代码道歉,但我无法想出另一种方法。
注意:它仅适用于两级层次。如果有人知道如何使用x<-length(stratum)
来执行此操作,请告诉我(在Stata中我可以附加一个宏观 - 不确定它在R中是如何工作的。)
ggkm<-function(time,event,stratum) {
m2s<-Surv(time,as.numeric(event))
fit <- survfit(m2s ~ stratum)
f$time <- fit$time
f$surv <- fit$surv
f$strata <- c(rep(names(fit$strata[1]),fit$strata[1]),
rep(names(fit$strata[2]),fit$strata[2]))
f$upper <- fit$upper
f$lower <- fit$lower
r <- ggplot (f, aes(x=time, y=surv, fill=strata, group=strata))
+geom_line()+geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.3)
return(r)
}
答案 0 :(得分:4)
我一直在lattice
中使用以下代码。第一个函数为一个组绘制KM曲线,通常用作panel.group
函数,而第二个函数为整个面板添加对数秩检验p值:
km.panel <- function(x,y,type,mark.time=T,...){
na.part <- is.na(x)|is.na(y)
x <- x[!na.part]
y <- y[!na.part]
if (length(x)==0) return()
fit <- survfit(Surv(x,y)~1)
if (mark.time){
cens <- which(fit$time %in% x[y==0])
panel.xyplot(fit$time[cens], fit$surv[cens], type="p",...)
}
panel.xyplot(c(0,fit$time), c(1,fit$surv),type="s",...)
}
logrank.panel <- function(x,y,subscripts,groups,...){
lr <- survdiff(Surv(x,y)~groups[subscripts])
otmp <- lr$obs
etmp <- lr$exp
df <- (sum(1 * (etmp > 0))) - 1
p <- 1 - pchisq(lr$chisq, df)
p.text <- paste("p=", signif(p, 2))
grid.text(p.text, 0.95, 0.05, just=c("right","bottom"))
panel.superpose(x=x,y=y,subscripts=subscripts,groups=groups,...)
}
审查指标必须为0-1才能使此代码生效。用法如下:
library(survival)
library(lattice)
library(grid)
data(colon) #built-in example data set
xyplot(status~time, data=colon, groups=rx, panel.groups=km.panel, panel=logrank.panel)
如果您只使用'panel = panel.superpose',那么您将无法获得p值。
答案 1 :(得分:1)
我开始几乎完全遵循您在更新后的答案中使用的方法。但令人恼火的是,它只会标记变化,而不是每个滴答 - 例如,它会给你0 - 100%,3 - 88%而不是0 - 100%,1 - 100%,2 - 100 %,3 - 88%。如果你将它输入ggplot,你的线将从0到3倾斜,而不是保持平坦并且直线下降到3.这可能会很好,这取决于你的应用和假设,但它不是经典的KM情节。这就是我如何处理不同数量的阶层:
groupvec <- c()
for(i in seq_along(x$strata)){
groupvec <- append(groupvec, rep(x = names(x$strata[i]), times = x$strata[i]))
}
f$strata <- groupvec
对于它的价值,这就是我最终做到的方式 - 但这也不是一个知识管理的情节,因为我没有计算出KM估计本身(虽然我没有审查,所以这个相当于......我相信。)
survcurv <- function(surv.time, group = NA) {
#Must be able to coerce surv.time and group to vectors
if(!is.vector(as.vector(surv.time)) | !is.vector(as.vector(group))) {stop("surv.time and group must be coercible to vectors.")}
#Make sure that the surv.time is numeric
if(!is.numeric(surv.time)) {stop("Survival times must be numeric.")}
#Group can be just about anything, but must be the same length as surv.time
if(length(surv.time) != length(group)) {stop("The vectors passed to the surv.time and group arguments must be of equal length.")}
#What is the maximum number of ticks recorded?
max.time <- max(surv.time)
#What is the number of groups in the data?
n.groups <- length(unique(group))
#Use the number of ticks (plus one for t = 0) times the number of groups to
#create an empty skeleton of the results.
curves <- data.frame(tick = rep(0:max.time, n.groups), group = NA, surv.prop = NA)
#Add the group names - R will reuse the vector so that equal numbers of rows
#are labeled with each group.
curves$group <- unique(group)
#For each row, calculate the number of survivors in group[i] at tick[i]
for(i in seq_len(nrow(curves))){
curves$surv.prop[i] <- sum(surv.time[group %in% curves$group[i]] > curves$tick[i]) /
length(surv.time[group %in% curves$group[i]])
}
#Return the results, ordered by group and tick - easier for humans to read.
return(curves[order(curves$group, curves$tick), ])
}