我正在尝试使用mle2命令为传染病隔室传播模型(SEIR,在我的情况下为SSEIR)运行MLE,试图将预测的每周死亡人数与观察到的每周死亡人数拟合曲线,类似于这个: plot of predicted vs observed weekly deaths。 但是,参数估计值似乎总是在我提供的(合理)边界上,而SE,z值,p值均为NA。
我建立了SEIR模型,然后用ode求解器对其进行求解。使用该模型输出和观察到的数据,我计算出负对数似然率,然后将其提交给mle2函数。 当我第一次设置它时,有多个错误导致脚本无法运行,但现在这些问题已解决,我似乎无法找到为什么该配件不起作用的根源。 我确信我为参数估计设置的边界是合理的。这些参数是隔室之间的转换率,因此定义为(例如)delta = 1 /传染持续时间,因此对于这些参数可以有非常真实的生物学界限。
我知道我试图用很多数据来拟合很多参数,但是当我只尝试拟合一个参数时,同样的问题仍然存在,因此不能成为根源。
library(deSolve)
library(bbmle)
#data
gdta <- c(0, 36.2708172419082, 1.57129615346629, 28.1146409459558, 147.701669719614, 311.876708482584, 512.401145459178, 563.798275104372, 470.731269976821, 292.716043742125, 153.604156195608, 125.760068922451, 198.755685044427, 143.847282793854, 69.2693867232681, 42.2093135487066, 17.0200426587424)
#build seir function
seir <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
dS0 <- - beta0 * S0 * (I/N)
dS1 <- - beta1 * S1 * (I/N)
dE <- beta0 * S0 * (I/N) + beta1 * S1 * (I/N) - delta * E
dI <- delta * E - gamma * I
dR <- gamma * I
return(list(c(dS0, dS1, dE, dI, dR)))
})
}
# build function to run seir, include ode solver
run_seir <- function(time, state, beta0, beta1, delta, gamma, sigma, N, startInf) {
parameters <- c(beta0, beta1, delta, gamma)
names(parameters) <- c("beta0", "beta1", "delta", "gamma")
init <- c(S0 = (N - startInf)*(sigma) ,
S1 = (N - startInf) * (1-sigma),
E = 0,
I = startInf,
R = 0)
state_est <- as.data.frame(ode(y = init, times = times, func = seir, parms = parameters))
return(state_est)
}
times <- seq(0, 16, by = 1) #sequence
states <- c("S0", "S1", "E", "I", "R")
# run the run_seir function to see if it works
run_seir(time = times, state= states, beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)), delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7, N = 1114100, startInf = 100)
#build calc likelihood function
calc_likelihood <- function(times, state, beta0, beta1, delta, gamma, sigma, N, startInf, CFR) {
model.output <- run_seir(time, state, beta0, beta1, delta, gamma, sigma, N, startInf)
LL <- sum(dpois(round(as.numeric(gdta)), (model.output$I)/(1/delta)*CFR, log = TRUE))
print(LL)
return(LL)
}
# run calc_likelihood function
calc_likelihood(time = times, state = states, beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)), delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7, N = 1114100, startInf = 100, CFR = 0.02)
#MLE
#parameters that are supposed to be fixed
fixed.pars <- c(N=1114100, startInf=100, CFR = 0.02)
#parameters that mle2 is supposed to estimate
free.pars <- c(beta0 = 1/(1.9/7), beta1 = 0.3*(1/(1.9/7)),
delta = 1/(4.1/7), gamma = 1/(4.68/7), sigma = 0.7)
#lower bound
lower_v <- c(beta0 = 0, beta1 = 0, delta = 0, gamma = 0, sigma = 0)
#upper bound
upper_v <- c(beta0 = 15, beta1 = 15, delta = 15, gamma = 15, sigma = 1)
#sigma = 1, this is not a typo
#mle function - need to use L-BFGS-B since we need to include boundaries
test2 <- mle2(calc_likelihood, start = as.list(free.pars), fixed = as.list(fixed.pars),method = "L-BFGS-B", lower = lower_v, upper = upper_v)
summary(test2)
运行mle2之后,我得到一条警告,说: 警告信息: 在mle2(calc_likelihood,start = as.list(free.pars),fixed = as.list(fixed.pars),中: 一些参数在边界上:基于Hessian的方差-协方差计算可能不可靠
,如果我查看summary(test2): 警告信息: 在sqrt(diag(object @ vcov))中:产生了NaNs
根据我到目前为止所做的研究,我知道第二个错误可能是由于估算值在边界上,所以我的问题确实是如何解决第一个错误。 如果仅在较低的边界上运行mle2,则会得到数百万的参数估计值,这是不正确的。
我相当确定我的SEIR模型规范是正确的,但是在盯着这段代码并尝试解决一个星期的问题后,我愿意接受任何有关如何使拟合工作的信息。
谢谢, JJ