Question

我正在用R编写循环或函数，但我仍然不太了解如何做到这一点。当前，我需要编写一个循环/函数（不确定哪个会更好），以在同一数据帧内创建多个Bootstrap结果。

样本数据集如下：

"ID A_d B_d C_d D_d E_D f_D chkgp
M1  10  20  60  30  54  33  Treatment
M1  20  50  40  33  31  44  Placebo
M2  40  80  40  23  15  66  Placebo
M2  30  90  40  67  67  66  Treatment
M3  30  10  20  22  89  77  Treatment
M3  40  50  30  44  50  88  Placebo
M4  40  30  40  42  34  99  Treatment
M4  30  40  50  33  60  80  Placebo",header = TRUE, stringsAsFactors = FALSE)

我已经编写了一个函数来查找Spearman相关性

k=cor(df$A_d,df$E_D,method="spearman")
k

结果是 -0.325407

现在我必须运行bootstrap方法以通过将两个变量中的数据进行混洗来获得相关值5000次

因此使用了以下代码

fc <- function(d, i){
    d2 <- d[i,]
    return(cor(df$A_d,df$E_D,method="spearman"))
}

在定义函数fc的情况下，我们可以使用boot命令，提供我们的数据集名称，函数以及要绘制的引导程序样本的数量。

计算出的BOOTSTRAP置信区间计算基于5000个引导程序副本。

#turn off set.seed() if you want the results to vary
set.seed(626)
bootcorr <- boot(hsb2, fc, R=500)
bootcorr

我找出了5000次重复的置信区间

boot.ci(boot.out = bootcorr, type =c( "perc"))

结果

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates

CALL : 
boot.ci(boot.out = bootcorr, type = c("perc"))

Intervals : 
Level     Percentile     
95%   (-0.3254, -0.3254 )  
Calculations and Intervals on Original Scale

我需要编写一个循环条件以获取结果，如下所示

Variable1 Variable2 confidence interval
A_d       E_D        (-0.3254, -0.3254 )  
A_d       f_D
B_d       E_D
B_d       f_D
C_d       E_D
C_d       f_D
D_d       E_D
d_d       f_D

因为我有一个包含100多个变量的数据集，所以每次都很难做到，所以我需要自动化部分来完成。

Answer 1

我们可以创建向量化函数并使用outer()：

corpij <- function(i,j,df) {cor(df[,i],df[,j],method="spearman")}
corp <- Vectorize(corpij, vectorize.args=list("i","j"))

outer(2:(ncol(df1)-1),2:(ncol(df1)-1),corp,df1)

#>            [,1]         [,2]         [,3]       [,4]        [,5]
#> [1,]  1.0000000  0.289588955 -0.480042672 0.22663483 -0.32540701
#> [2,]  0.2895890  1.000000000 -0.006379918 0.53614458 -0.35928788
#> [3,] -0.4800427 -0.006379918  1.000000000 0.01913975 -0.13952023
#> [4,]  0.2266348  0.536144578  0.019139754 1.00000000  0.02395253
#> [5,] -0.3254070 -0.359287879 -0.139520230 0.02395253  1.00000000
#> [6,]  0.7680403 -0.120481928 -0.421074589 0.33734940  0.07185758
#>             [,6]
#> [1,]  0.76804027
#> [2,] -0.12048193
#> [3,] -0.42107459
#> [4,]  0.33734940
#> [5,]  0.07185758
#> [6,]  1.00000000

另一种方法是使用psych::corr.test()：

library(psych)

corr.test(df1[,-c(1,ncol(df1))], method = "spearman")$r

数据：

df1 <- read.table(text="ID A_d B_d C_d D_d E_D f_D chkgp
                        M1  10  20  60  30  54  33  Treatment
                        M1  20  50  40  33  31  44  Placebo
                        M2  40  80  40  23  15  66  Placebo
                        M2  30  90  40  67  67  66  Treatment
                        M3  30  10  20  22  89  77  Treatment
                        M3  40  50  30  44  50  88  Placebo
                        M4  40  30  40  42  34  99  Treatment
                        M4  30  40  50  33  60  80  Placebo",
header = TRUE,stringsAsFactors = FALSE)

循环生成两个变量的相关性，并使用Bootstrap计算置信区间

现在我必须运行bootstrap方法以通过将两个变量中的数据进行混洗来获得相关值5000次

因此使用了以下代码

在定义函数fc的情况下，我们可以使用boot命令，提供我们的数据集名称，函数以及要绘制的引导程序样本的数量。

我找出了5000次重复的置信区间

因为我有一个包含100多个变量的数据集，所以每次都很难做到，所以我需要自动化部分来完成。

1 个答案: