Question

我试图了解spBayesSurv包中的函数indeptCoxph。此功能适合贝叶斯比例风险模型。我对理解R代码的部分内容以及Cox模型理论感到困惑。

我正在撰写作者＆＃39;例子（下）。他们首先模拟了生存时间数据，但我无法按照他们的代码执行此操作。在我看来，他们首先是用CDF的指数分布来模拟生存时间 F（t）= 1- exp（-lambda * t）除了lambda的值是 exp（sum（xi * betaT））而不仅仅是一个常数。为了模拟数据，参数betaT给出一个固定的常数值，它是真值，xi是预测数据。

问题1 - 由于Cox Hazard模型，这个lambda的定义/形式是什么？在这个例子中，作者是否对生存分布做出了特殊的假设？

问题2-我很难理解生成生存时间数据的以下关键代码（当然它依赖于最后给出的代码）：

## Generate survival times t

u = pnorm(z);
t = rep(0, ntot);
for (i in 1:ntot){
t[i] = Finv(u[i], x[i]);
}
tTrue = t; #plot(x,t);

函数Finv（u，xi）得到满足F（t）= u的生存时间t的值，其中我认为xi是预测变量。我真的不明白为什么你必须来自正常的CDF。他们已经生成了＆＃34; z＆＃34;作为来自多元正态分布（具有3个分量）的单个绘制，并且u是正常CDF值的向量u = pnorm（z）。再次，不知道为什么＆＃34; u＆＃34;必须以这种方式生成 - 如果可以澄清u，z，t和lambda之间的关系，那将非常有用。＆＃34; z＆＃34;的协方差矩阵也是由作者从代码中的两个行向量s1和s2生成的 - 但是如果我只是使用生存时间数据拟合模型，那么它会混淆s1，s2的作用＆＃34; t＆＃34;和预测变量＆＃34; x＆＃34;。

作者＆＃39;代码：

###############################################################
# A simulated data: Cox PH
###############################################################

rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
library(MASS)
## True parameters
betaT = c(-1);
theta1 = 0.98; theta2 = 100000;
## generate coordinates:
## npred is the # of locations for prediction
n = 100; npred = 30; ntot = n + npred;
ldist = 100; wdist = 40;
s1 = runif(ntot, 0, wdist); s2 = runif(ntot, 0, ldist);
s = rbind(s1,s2); #plot(s[1,], s[2,]);
## Covariance matrix
corT = matrix(1, ntot, ntot);
for (i in 1:(ntot-1)){
for (j in (i+1):ntot){
dij = sqrt(sum( (s[,i]-s[,j])^2 ));
corT[i,j] = theta1*exp(-theta2*dij);
corT[j,i] = theta1*exp(-theta2*dij);
}
}
## Generate x
x = runif(ntot,-1.5,1.5);
## Generate transformed log of survival times
z = mvrnorm(1, rep(0, ntot), corT);
## The CDF of Ti: Lambda(t) = t;
Fi = function(t, xi){
res = 1-exp(-t*exp(sum(xi*betaT)));
res[which(t<0)] = 0;
res
}
## The pdf of Ti:
fi = function(t, xi){
res=(1-Fi(t,xi))*exp(sum(xi*betaT));
res[which(t<0)] = 0;
res
}
#integrate(function(x) fi(x, 0), -Inf, Inf)
## true plot
xx = seq(0, 10, 0.1)
#plot(xx, fi(xx, -1), "l", lwd=2, col=2)
#lines(xx, fi(xx, 1), "l", lwd=2, col=3)

## The inverse for CDF of Ti
Finvsingle = function(u, xi) {
res = uniroot(function (x) Fi(x, xi)-u, lower=0, upper=5000);
res$root
}
Finv = function(u, xi) {sapply(u, Finvsingle, xi)};

## Generate survival times t
u = pnorm(z);
t = rep(0, ntot);
for (i in 1:ntot){
t[i] = Finv(u[i], x[i]);
}
tTrue = t; #plot(x,t);

Answer 1

实际上，数据是在空间copula Cox PH模型的框架下生成的。阅读the supplemental material of Zhou et al. (2015)的4.1节是有帮助的。当您拟合非空间PH模型时，可以在不使用s1和s2的情况下对数据生成过程进行采样;请参阅https://stats.stackexchange.com/questions/253368/bayesian-survival-analysis处的新示例。

在这个新示例中，f0oft(t)和S0oft(t)分别是基线生存函数。给定具有协变量x的主题，Sioft(t,x)和fioft(t,x)是该主题的存活和密度。 Finv(u,x)是Fioft(t,x)=1-Sioft(t,x)的反函数，即Finv(u,x)是Fioft(t,x)=u w.r.t t的解。

要生成生存数据，我们可以先生成协变量：

    x1 = rbinom(ntot, 1, 0.5); x2 = rnorm(ntot, 0, 1); X = cbind(x1, x2);

给定每个协变量向量X，可以生成真实生存时间tT

    u = runif(ntot);
    tT = rep(0, ntot);
    for (i in 1:ntot){
      tT[i] = Finv(u[i], X[i,]);
    }

这里的基本原理是，如果T | x~F（t，x），则F（T，x）〜均匀（0,1）。

坚持使用R中的包示例代码 - 模拟数据以适合模型

1 个答案: