我无法使用PyMC从提供的BUGS代码复制结果。 BUGS模型是Andersen-Gill乘法强度Cox PH模型。
model
{
# Set up data
for(i in 1:Nsubj) {
for(j in 1:T) {
# risk set = 1 if obs.t >= t
Y[i,j] <- step(obs.t[i] - t[j] + eps)
# counting process jump = 1 if obs.t in [ t[j], t[j+1] )
# i.e. if t[j] <= obs.t < t[j+1]
dN[i, j] <- Y[i, j] * step(t[j + 1] - obs.t[i] - eps) * FAIL[i]
}
Useless[i] <- pscenter[i] + hhcenter[i] + ncomact[i]
+ rleader[i] + dleader[i] + inter1[i] + inter2[i]
}
# Model
for(j in 1:T) {
for(i in 1:Nsubj) {
dN[i, j] ~ dpois(Idt[i, j]) # Likelihood
Idt[i, j] <- Y[i, j] * exp(beta[1]*pscenter[i] + beta[2]*
hhcenter[i] + beta[3]*ncomact[i] + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) * dL0[j] # Intensity
}
dL0[j] ~ dgamma(mu[j], c)
mu[j] <- dL0.star[j] * c # prior mean hazard
}
c ~ dgamma(0.0001, 0.00001)
r ~ dgamma(0.001, 0.0001)
for (j in 1 : T) { dL0.star[j] <- r * (t[j + 1] - t[j]) }
# next line indicates number of covariates and is for the corresponding betas
for(i in 1:7) {beta[i] ~ dnorm(0.0,0.00001)}
}
我使用以下初始值
list(beta=c(-.36,-.26,-.29,-.22,-.61,-9.73,-.23), c=0.01, r=0.01, dL0=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))
我使用单链(现在)和5000次迭代进行老化。我运行估计10000次额外迭代并获得与论文中报告的相同的点估计。这些也接近早先的频繁估计。
OpenBUGS> samplesStats('beta')
mean sd MC_error val2.5pc median val97.5pcstart sample
beta[1] 3.466 0.8906 0.03592 1.696 3.48 5.175 501 9500
beta[2] -0.04155 0.06253 0.002487 -0.1609 -0.04355 0.08464 501 9500
beta[3] -0.009709 0.07353 0.002008 -0.1544 -0.01052 0.1365 501 9500
beta[4] 0.3535 0.1788 0.004184 -0.01523 0.3636 0.6724 501 9500
beta[5] 0.08454 0.1652 0.004261 -0.2464 0.08795 0.3964 501 9500
beta[6] -4.109 1.325 0.05224 -6.617 -4.132 -1.479 501 9500
beta[7] 0.1413 0.08594 0.003381 -0.03404 0.1423 0.3031 501 9500
OpenBUGS> samplesStats('c')
mean sd MC_error val2.5pc median val97.5pcstart sample
c 4.053 1.08 0.02896 2.202 3.974 6.373 1001 10000
OpenBUGS> samplesStats('r')
mean sd MC_error val2.5pc median val97.5pcstart sample
r 0.01162 0.002929 7.846E-5 0.007387 0.01119 0.01848 1001 10000
我尝试使用以下代码在PyMC 2.3.2中复制它。完整复制代码可用here
def cox_model(dta):
(t, obs_t, pscenter, hhcenter, ncomact, rleader,
dleader, inter1, inter2, fail) = load_data_cox()
T = len(t) - 1
nsubj = len(obs_t)
# risk set equals one if obs_t >= t
Y = np.array([[int(obs >= time) for time in t] for obs in obs_t])
# counting process. jump = 1 if obs_t \in [t[j], t[j+1])
dN = np.array([[Y[i,j]*int(t[j+1] >= obs_t[i])*fail[i] for i in range(nsubj)] for j in range(T)])
c = Gamma('c', .0001, .00001, value=.1)
r = Gamma('r', .001, .0001, value=.1)
dL0_star = r*np.array([t[j+1] - t[j] for j in range(T)])
mu = dL0_star * c # prior mean hazard
dL0 = Gamma('dL0', mu, c, value=np.ones(T))
beta = Normal('beta', np.zeros(7), np.ones(7)*.00001,
value=np.array([-.36, -.26, -.29, -.22, -.61, -9.73, -.23]))
@deterministic
def idt(b1=beta, dl0=dL0):
mu_ = [[Y[i,j] * np.exp(b1[0]*pscenter[i] + b1[1]*hhcenter[i] +
b1[2]*ncomact[i] + b1[3]*rleader[i] +
b1[4]*dleader[i] + b1[5]*inter1[i] +
b1[6]*inter2[i])*dl0[j] for i in range(nsubj)]
for j in range(T)] # intensity
return mu_
dn_like = Poisson('dn_like', idt, value=dN, observed=True)
return locals()
m = MCMC(cox_model())
m.sample(15000)
但是,我没有接近相同的估计值。我得到像
这样的东西beta:
Mean SD MC Error 95% HPD interval
------------------------------------------------------------------
-0.537 1.094 0.099 [-2.549 1.492]
0.276 0.048 0.004 [ 0.184 0.36 ]
-1.092 0.385 0.038 [-1.559 -0.371]
-1.461 0.746 0.073 [-2.986 -0.496]
-1.865 0.382 0.038 [-2.471 -1.329]
3.778 1.539 0.133 [ 1.088 6.623]
-0.449 0.109 0.01 [-0.661 -0.26 ]
Posterior quantiles:
2.5 25 50 75 97.5
|---------------|===============|===============|---------------|
-2.892 -1.274 -0.385 0.268 1.253
0.191 0.244 0.278 0.305 0.374
-1.553 -1.434 -1.179 -0.793 -0.258
-3.132 -1.856 -1.196 -0.904 -0.526
-2.471 -2.199 -1.864 -1.632 -1.201
1.287 2.685 3.601 4.72 7.262
-0.714 -0.519 -0.445 -0.368 -0.273
最令人担忧的是,迹象不同。我想也许这只是一个收敛问题,所以我在一夜之间进行了50,000次迭代而没有太大的改变。也许我的PyMC模型存在一些错误或差异,特别是dL0规范?
我尝试过不同的起始值。我试过让模型运行多次迭代。我将这些先验集中在BUGS的点估计上。
答案 0 :(得分:3)
我认为问题是不收敛的,正如您所想的那样,PyMC2和BUGS实现之间唯一的实质区别是步骤方法和老化期。要对此进行调查,我对idt
进行了更改,以使其运行得更快,但idt
具有相同的值:
X = np.array([pscenter, hhcenter, ncomact, rleader, dleader, inter1, inter2]).T
@deterministic
def idt(beta=beta, dL0=dL0):
intensity = np.exp(np.dot(X, beta))
return np.transpose(Y[:,:T] * np.outer(intensity, dL0))
有了这个,绘制beta
的轨迹显示MCMC没有在50,000次迭代中收敛:
我推荐一些东西:不同的起始值,不同的步骤方法,以及与BUGS老化相当的老化间隔:
vars = cox_model(dta)
pm.MAP(vars).fit(method='fmin_powell')
m = pm.MCMC(vars)
m.use_step_method(pm.AdaptiveMetropolis, m.beta)
m.sample(iter=50000, burn=25000, thin=25)
在这种情况下绘制迹线显示了更有希望的东西:
这会产生点估计值,这与上面的BUGS估计值非常相似:
beta:
Mean SD MC Error 95% HPD interval
------------------------------------------------------------------
3.436 0.861 0.035 [ 1.827 5.192]
-0.039 0.063 0.002 [-0.155 0.081]
-0.028 0.073 0.003 [-0.159 0.119]
0.338 0.174 0.007 [ 0.009 0.679]
0.069 0.164 0.007 [-0.263 0.371]
-4.022 1.29 0.055 [-6.552 -1.497]
0.136 0.085 0.003 [-0.027 0.307]