Question

我正在尝试计算R中的置信区间。由于某些特殊原因，我必须使用“ bootstrap”包中的函数来执行此操作。（这意味着我无法使用“ boot”包中的函数。）

这是我的代码。

我正在做的是尝试计算Pearson相关系数，然后应用Bootstrap方法（B = 100）获得相关系数的估计值。但是我不知道如何构造95％的置信区间。

library(bootstrap) 
data('law')

set.seed(1)
theta <- function(ind) {
  cor(law[ind, 1], law[ind, 2], method = "pearson")
  }
law.boot <- bootstrap(1:15, 100, theta) 
# sd(law$thetastar)
percent.95 <- function(x) {
  quantile(x,  .95)
  }
law.percent.95 <- bootstrap(1:15, 100, theta, func=percent.95)

很抱歉，如果我没有弄清楚自己或标记错误的标签。抱歉两次没有生成数据集（现在提供了），并感谢Roland教授指出。非常感谢！

Answer 1

您可以手动执行此操作。

library(bootstrap) 
data('law')
names(law) <- tolower(names(law))

set.seed(1)
theta <- function(ind) cor(law[ind, 1], law[ind, 2], method = "pearson")
law.boot <- bootstrap(1:15, 1000, theta) 

ci1 <- qnorm(p=c(.025, .975), 
            mean=mean(law.boot$thetastar), 
            sd=sd(law.boot$thetastar))

给予：

> ci1
[1] 0.5055894 1.0268053

与从头开始进行引导相比：

set.seed(1)
FX <- function(x) with(x, cor(lsat, gpa))
boot <- replicate(1000, FX(law[sample(nrow(law), round(nrow(law)), 
                                     replace=TRUE), ]))

ci2 <- qnorm(p=c(.025, .975), 
            mean=mean(boot), 
            sd=sd(boot))

给予：

> ci2
[1] 0.5065656 1.0298412

因此ci1和ci2似乎很相似。

但是，请注意：我已经将引导程序调整为1000次重复。仅重复100次，差异自然就会更大。

注释2：我的回答是按要求考虑的配置项。但是，使用百分位数可能更合适。请参阅thothal's answer如何获得它们。

Answer 2

有多种方法计算引导估计量的配置项（参见to this Wikipedia article for instance。

最简单的方法是根据自举系数（维基百科文章中的Percentile Bootstrap）说明2.5%和97.5%分位数：

quantile(law.boot$thetastar, c(0.025, 0.975))
#      2.5%     97.5% 
# 0.4528745 0.9454483

基本引导程序的计算方式为

2 * mean(law.boot$thetastar) - quantile(law.boot$thetastar, c(0.975, 0.025))
#     97.5%      2.5% 
# 0.5567887 1.0493625

如何使用R中的“引导函数”来计算置信区间

2 个答案: