我正在分析一项针对临床药物和酒精试验参与者戒酒率的实验。有两组,接受新疗法的人群和接受安慰剂的人群。前一周的禁欲在0(基线),4、8、12和24周进行。损耗/丢失数据的发生率很高,因此在开始进行试验的128个数据中,只有55个在随访时提供了数据。
这里是数据。
df <- data.frame(id = factor(rep(1:128,each=5)),
time = rep(c(0,4,8,12,24),times=128),
group = c(rep("placebo",335),rep("treatment",305)),
abs = c(0, 0, 0, 0, 0, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, 1, 1, NA, 1, 0, 0, 0, NA, NA, 0, 0, 1, NA, 0, 0, 0, NA, NA, 1, 0, NA, NA, NA, 0, 0, 0, NA, NA, NA, 0, NA, NA, NA, 1, 0, 0, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, NA, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, 0, 0, NA, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, NA, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, NA, NA, NA, NA, 0, 0, 0, NA, NA, 0, 1, 1, 1, NA, 0, 0, 0, 0, NA, 0, 0, 0, NA, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, 0, 0, 1, 1, NA, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, NA, 0, 0, 1, 1, 1, 0, 0, 0, NA, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, NA, 0, 0, NA, NA, NA, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, 1, 0, 1, NA, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, NA, 0, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, NA, 0, NA, 0, NA, 1, NA, 1, 0, 0, NA, NA, NA, 0, 1, 1, 1, 1, 0, 0, 0, NA, NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, 0, 0, 0, 0, NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, NA, 0, 0, 0, 0, NA, 0, NA, NA, NA, NA, 0, 1, 1, 1, 1, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, NA, NA, NA, NA, 0, 0, 0, 0, 1, 0, 0, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 1, 1, 1, 1, 0, 0, 1, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, NA, 0, NA, 0, 0, NA, 0, NA, 0, 0, 0, 0, NA, 0, 0, NA, NA, NA))
随着时间的流逝,观察到的病例的比例看起来像这样
从表面上看,这看起来像是不错的治疗效果,尤其是在第24周时。
我在mixed_model
包中使用了GLMMadaptive
函数来运行纵向逻辑回归模型,因为该包允许计算总体平均系数。现在,当我运行将时间作为连续变量的模型时,模型可以很好地收敛
library(GLMMadaptive)
m <- mixed_model(abs ~ group*time,
random = ~1|id,
data = df,
family = binomial())
summary(m)
# model output
# Call:
# mixed_model(fixed = abs ~ group * time, random = ~1 | id, data = df,
# family = binomial())
#
# Data Descriptives:
# Number of Observations: 440
# Number of Groups: 128
#
# Model:
# family: binomial
# link: logit
#
# Fit statistics:
# log.Lik AIC BIC
# -137 285 299
#
# Random effects covariance matrix:
# StdDev
# (Intercept) 2.68
#
# Fixed effects:
# Estimate Std.Err z-value p-value
# (Intercept) -4.9350 0.8431 -5.854 <1e-04
# grouptreatment 0.0913 0.8971 0.102 0.919
# time 0.1070 0.0362 2.956 0.003
# grouptreatment:time 0.0952 0.0530 1.798 0.072
#
# Integration:
# method: adaptive Gauss-Hermite quadrature rule
# quadrature points: 11
#
# Optimization:
# method: hybrid EM and quasi-Newton
# converged: TRUE
但是,我真正想做的是在每个时间点进行对比计算,计算出禁食几率的组差异,并控制其他时间点。我对第24周戒酒几率的差异特别感兴趣(控制所有其他时间点)。所以我想把时间当作一个因素。但是,当我运行 this 模型时:
m <- mixed_model(abs ~ group*factor(time),
random = ~1|id,
data = df,
family = binomial())
summary(m)
它无法收敛,我收到以下错误消息。
Error in mixed_fit(y, X, Z, X_zi, Z_zi, id, offset, offset_zi, family, :
A large coefficient value has been detected during the optimization.
Please re-scale you covariates and/or try setting the control argument
'iter_EM = 0'. Alternatively, this may due to a
divergence of the optimization algorithm, indicating that an overly
complex model is fitted to the data. For example, this could be
caused when including random-effects terms (e.g., in the
zero-inflated part) that you do not need. Otherwise, adjust the
'max_coef_value' control argument.
我一直遵循此错误消息中的建议以及软件包here的作者提供的关于优化的建议,但均无济于事。但是,我感觉就像是狒狒试图指挥维也纳爱乐乐团,结果是您所期望的,全部都是随机的,有点混乱。
该模型不适合吗?如果没有,没有人有任何提示如何使它收敛?如果是这样,那么有人对其他方法有任何建议吗?