我正在尝试使用具有不同效果大小的两部分跨栏模拟。起初,我正在做一个非常简单的方法来实现这一目标。我得到了一些建议,可以使用函数而不是for循环来进行优化。
在运行模拟时,如果r = 50(r是重新拟合数据的次数),则它将运行且不会引发任何错误。
但是,当迭代次数保持为1000时更改r = 1000时,RStudio会在标题(as.matrix(fit_zero$hessian)) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
中引发错误。
我尝试过对该主题进行一些研究,但似乎有很多问题,但信息却很少。我的假设是,当我通过1000次迭代将数据重新拟合1000次时,数据集的重复会导致数据趋于奇异,而那才是收敛的地方?
下面提供了代码中的已知问题。我正在使用PSCL
和bootstrap
软件包:
fit4 = hurdle(formula = y ~ m + x, data=data0, dist = "poisson", zero.dist = "binomial")
a_hat_b0 = summary(fit3)$coef[2,1]
b1_hat_b0 = summary(fit4)[[1]]$count[2,1]
b2_hat_b0 = summary(fit4)[[1]]$zero[2,1]
ab1_hat_b0 = prod(a_hat_b0,b1_hat_b0)
ab2_hat_b0 = prod(a_hat_b0,b2_hat_b0)
model_list <- list(model=data.frame(ab1_hat, ab2_hat, ab1_hat_b0, ab2_hat_b0), data=data, data0=data0)
return(model_list)
}
###
# given the base model data, randomly (seeded) samples a row and fits the data
bootstrap <- function(model_list){
data <- as.data.frame(model_list[['data']])
data0 <- as.data.frame(model_list[['data0']])
model_df <- as.data.frame(model_list[['model']])
boot.data = data[sample(nrow(data), replace = TRUE), ]
boot.data$y[1] = if(prod(boot.data$y) > 0) 0 else boot.data$y[1]
boot.fit1 = lm(m ~ x, data=boot.data)
boot.fit2 = hurdle(formula = y ~ m + x, data=boot.data, dist = "poisson", zero.dist = "binomial")
收到错误后,我运行了traceback()
,它给了我以下内容:
15: solve.default(as.matrix(fit_zero$hessian))
14: solve(as.matrix(fit_zero$hessian))
13: hurdle(formula = y ~ m + x, data = boot.data0, dist = "poisson",
zero.dist = "binomial") at #23
12: bootstrap(model_list) at #21
11: FUN(X[[i]], ...)
10: lapply(X = X, FUN = FUN, ...)
9: sapply(1:r, function(r.index) {
return(bootstrap(model_list))
}) at #20
8: FUN(X[[i]], ...)
7: lapply(X = X, FUN = FUN, ...)
6: sapply(1:iterations, function(iterations.index) {
model_list <- model(n, a, b, c, i)
boot.fit <- sapply(1:r, function(r.index) {
return(bootstrap(model_list))
})
boot.fit <- matrix(unlist(boot.fit), ncol = 8, byrow = TRUE)
print(paste0(iterations.index, "/", iterations, " iterations"))
return(results(boot.fit, r))
}) at #17
5: FUN(newX[, i], ...)
4: apply(effect, 1, function(parameters) {
print(parameters)
seed <- parameters[1]
n <- parameters[2]
a <- parameters[3]
b <- parameters[4]
c <- parameters[5]
i <- parameters[6]
set.seed(seed)
seed.result <- sapply(1:iterations, function(iterations.index) {
model_list <- model(n, a, b, c, i)
boot.fit <- sapply(1:r, function(r.index) {
return(bootstrap(model_list))
})
boot.fit <- matrix(unlist(boot.fit), ncol = 8, byrow = TRUE)
print(paste0(iterations.index, "/", iterations, " iterations"))
return(results(boot.fit, r))
})
averaged <- t(apply(seed.result, 1, mean))
colnames(averaged) <- c("pow ab1", "pow ab2", "T1E ab1",
"T1E ab2")
print(data.frame(seed, n, a, b, c, i, averaged))
return(data.frame(seed, n, a, b, c, i, averaged))
}) at #6
3: FUN(X[[i]], ...)
2: lapply(getParameters(), function(effect) {
apply(effect, 1, function(parameters) {
print(parameters)
seed <- parameters[1]
n <- parameters[2]
a <- parameters[3]
b <- parameters[4]
c <- parameters[5]
i <- parameters[6]
set.seed(seed)
seed.result <- sapply(1:iterations, function(iterations.index) {
model_list <- model(n, a, b, c, i)
boot.fit <- sapply(1:r, function(r.index) {
return(bootstrap(model_list))
})
boot.fit <- matrix(unlist(boot.fit), ncol = 8, byrow = TRUE)
print(paste0(iterations.index, "/", iterations, " iterations"))
return(results(boot.fit, r))
})
averaged <- t(apply(seed.result, 1, mean))
colnames(averaged) <- c("pow ab1", "pow ab2", "T1E ab1",
"T1E ab2")
print(data.frame(seed, n, a, b, c, i, averaged))
return(data.frame(seed, n, a, b, c, i, averaged))
})
}) at #5
1: main()
对于R来说还很陌生,我认为这意味着主要问题发生在
hurdle(formula = y ~ m + x, data = boot.data0, dist = "poisson",
zero.dist = "binomial") at #23
是否有人知道导致此错误的原因以及是否有解决方法?我将很乐意为您提供有关在哪里进行读取或运行其他需要进行的分析/调试的提示。但是,我在这里很茫然。