R中的Bootstrap多项式回归

时间:2015-10-16 12:17:46

标签: r bootstrapping logistic-regression

我正在尝试在R中引导一个简单的多项式回归,我收到一个错误:

  

is.data.frame(data)中的错误:object' d'找不到

真正奇怪的是,我使用与引导程序包at Quick-R的教程中相同的代码(已针对此特定问题进行了调整),并且当我使用不同的函数(例如lm)时,相同的代码也起作用())。当然,我做了一些愚蠢的事情,但我看不出是什么。如果有人可以提供帮助,我会非常感激。

这是一个例子:

require(foreign)
require(nnet)
require(boot)

# an example for multinomial logistic regression
ml = read.dta('http://www.ats.ucla.edu/stat/data/hsbdemo.dta')
ml = ml[,c(5,7,3)]

bs <- function(formula, data, indices) {
    d = data[indices,] # allows boot to select sample
    fit = multinom(formula, data=d)
    s = summary(fit)
    return(list(fit$coefficients, fit$standard.errors))
}

# 5 replications
results = list()
results <- boot(
    data=ml, statistic=bs, R=5, parallel='multicore',
    formula=prog~write
)

2 个答案:

答案 0 :(得分:0)

错误发生在summary()部分,multinom()返回的对象也没有coefficientsstandard.errors。似乎summary.multinom()反过来从您的数据d计算出粗麻布,由于某种原因(可能是一个范围问题)无法找到。快速解决方法是添加Hess = TRUE

bs <- function(formula, data, indices) {
  d = data[indices,] # allows boot to select sample
  fit = multinom(formula, data=d, Hess = TRUE)
  s = summary(fit)
  return( cbind(s$coefficients, s$standard.errors) )
}

# 5 replications
results = list()
results <- boot(
  data=ml, statistic=bs, R=5, parallel='multicore',
  formula=prog~write
)

答案 1 :(得分:0)

多项逻辑回归使用coef()函数返回系数矩阵。这与返回系数向量的lmglm模型不同。

library(foreign)     # read.dta()
library(nnet)        # multinom()
require(boot)        # boot()

# an example for multinomial logistic regression
ml = read.dta('http://www.ats.ucla.edu/stat/data/hsbdemo.dta')
ml = ml[,c(5,7,3)]

names(ml)

bs <- function(formula, data, indices) {
  d = data[indices,] # allows boot to select sample
  fit = multinom(formula, data=d, maxit=1000, trace=FALSE)
  #s = summary(fit)
  #return(list(fit$coefficients, fit$standard.errors))

  estimates <- coef(fit)
  return(t(estimates))
}

# enable parallel

library(parallel)
cl <- makeCluster(2)
clusterExport(cl, "multinom")

# 10000 replications
set.seed(1984)

results <- boot(
  data=ml, statistic=bs, R=10000, parallel = "snow", ncpus=2, cl=cl,
  formula=prog~write
)

# label the estimates

subModelNames <- colnames(results$t0)
varNames <- rownames(results$t0)

results$t0

estNames <- apply(expand.grid(varNames,subModelNames),1,function(x) paste(x,collapse="_"))

estNames

colnames(results$t) <- estNames

# summary of results

library(car)

summary(results)

confint(results, level=0.95, type="norm")
confint(results, level=0.95, type="perc")
confint(results, level=0.95, type="bca")

# plot the results

hist(results, legend="separate")