使用boot()从多个层重新采样

时间:2013-06-01 20:16:34

标签: r bootstrapping

我正在尝试从特定人群中重新取样时引导ZIP估算。每个群体(群集)在某种程度上都是根本不同的,所以我想在自举中按比例代表它们。 strata命令会这样做。

我有时会遇到以下错误:

solve.default中的错误(as.matrix(fit $ hessian)):   系统是计算奇异的:倒数条件数= 2.02001e-16

这是一种复制问题的方法,它应该只需要大约一分钟左右的时间来运行,具体取决于您的计算机:

#Load dependencies
library(AER)
library(boot)
library(pscl)
library(sampling)

#generate some fake data.q. Seed will be used to make it replicatable.
set.seed(1) 
x1<-rpois(1000,1)
set.seed(1)  
x2<-rnorm(1000,0,1)
set.seed(1)
e<-round(runif(1000,0,1)) #this should add some disruptions and prevent any multicolinearity.
pop<-rep(1:10,length.out=1000)  #there are 10 populations
y<-x1*abs(floor(x2*sqrt(pop)))+e  #the populations each impact the y variable somewhat differently
fake_data<-as.data.frame(cbind(y,x1,x2,pop))
fake_data$pop<-factor(pop)  #they are not actually simple scalars.

#Run zip proccess, confirm it works. I understand it's not a matching model.
system.time(zip<-zeroinfl(y ~ x1+x2+pop | x1+x2+pop, data=fake_data))

#storing estimates to speed up bootstrapping phase. General technique from http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm
count_hold<-as.data.frame(dput(coef(zip, "count")))
count_short<-c(count_hold[,1])
zero_hold<-as.data.frame(dput(coef(zip, "zero")))
zero_short<-c(zero_hold[,1])

#bootstrapping
f <- function(fake_data, i) {
  zip_boot<- zeroinfl(y ~ x1+x2+pop | x1+x2+pop, data=fake_data[i,], start=list(count=count_short, zero=zero_short))
  return(coef(zip_boot))
  } #defines function for R to repeat in bootstrapping phase. 

set.seed(1)  
system.time(res <- boot(fake_data, f, R =50, strata=fake_data$pop)) #adjust the number of cpus to match your computer.

应该有足够的样本,考虑到我有900多个自由度,并且每个群体中至少有100个样本来获取我的重新采样估计值。

我的问题: 1)我做了什么导致这种多重性?

0 个答案:

没有答案