我的目标是使用自举(1000次重复)来计算从我的600个独特个体的数据集生成的20个受刺激的随机对中的r(Pearson相关系数)相关性状(x)的空分布,均值和CI(ID) )。我最近从SAS切换到R,我将使用“proc surveyselect”来生成数据集。问题:
模拟的600个人的起始数据集和相关的特征值:
ID <- seq(1, 600, by = 1)
x <- rnorm(600, m = 7, sd = 2)
X <- as.data.frame(cbind(ID, x))
然后我生成了我的1000个重复的r并计算95%CI:
for (i in 1:1000) {
X.sample <- X[ sample(1:nrow(X), 40, replace = FALSE), ]
X.sample.1 <- X.sample[1:20, ]
X.sample.2 <- X.sample[21:40, ]
Y <- as.data.frame(cbind(X.sample.1$ID, X.sample.1$x, X.sample.2$ID, X.sample.2$x))
cor.results <- cor.test(Y[,2], Y[,4], alternative = c("greater"), method = c("pearson"))
Z[i] <- cor.results$estimate
}
error <- qt(0.975, df = (length(Z) - 1)) * (sd(Z))/sqrt(length(Z))
答案 0 :(得分:1)
尝试使用此尺寸:
# generate dataset
set.seed(1)
X <- rnorm(600, 7, 2)
# Create a function that samples 40 elements from X,
# and calculates Pearson's r for the first 20 elements
# against the last 20 elements.
booties <- function(x) {
X.samp <- sample(x, 40)
cor(X.samp[1:20], X.samp[21:40])
}
# Replicate this function 1000 times (spits out a vector of cor estimates)
Z <- replicate(1000, booties(X))
error <- qt(0.975, length(Z)-1 * sd(Z)/sqrt(length(Z)))
1000次重复大约需要0.08秒完成(比你正在试验的for
循环快一个数量级)。
答案 1 :(得分:0)
通常,隐式循环在显式循环中更快。尝试将循环中的代码放入函数中,然后在lapply或sapply语句中使用该函数。
myfunction = function(<insert relevant parameters here>)
{
X.sample <- X[ sample(1:nrow(X), 40, replace = FALSE), ]
X.sample.1 <- X.sample[1:20, ]
X.sample.2 <- X.sample[21:40, ]
Y <- as.data.frame(cbind(X.sample.1$ID, X.sample.1$x, X.sample.2$ID, X.sample.2$x))
cor.results <- cor.test(Y[,2], Y[,4], alternative = c("greater"), method = c("pearson"))
cor.results$estimate
}
Z = sapply(x, myfunction)
#Here every element of x contains the arguments you want to pass to my function
#You can pass multiple arguments separated by commas after the function name
error <- qt(0.975, df = (length(Z) - 1)) * (sd(Z))/sqrt(length(Z))
你可以这样做,但我发现如果可以的话,最好只使用boot()
包中的boot
功能。
至于set.seed()
你需要在每次生成随机任何东西之前直接设置它。见下文。
> rnorm(6)
[1] 1.0915017 -0.6229437 -0.9074604 -1.5937133 0.3026445 1.6343924
> set.seed(1001)
> rnorm(6)
[1] 2.1886481 -0.1775473 -0.1852753 -2.5065362 -0.5573113 -0.1435595
> set.seed(1001)
> rnorm(6)
[1] 2.1886481 -0.1775473 -0.1852753 -2.5065362 -0.5573113 -0.1435595
> rnorm(6)
[1] 1.0915017 -0.6229437 -0.9074604 -1.5937133 0.3026445 1.6343924
> set.seed(1001)
> sample(1:5,10,replace=T)
[1] 5 3 3 3 3 5 1 1 2 4
> sample(1:5,10,replace=T)
[1] 3 1 5 3 2 5 1 2 1 4
> set.seed(1001)
> sample(1:5,10,replace=T)
[1] 5 3 3 3 3 5 1 1 2 4
> rnorm(6)
[1] -0.1435595 1.0915017 -0.6229437 -0.9074604 -1.5937133 0.3026445
> set.seed(1001)
> rnorm(6)
[1] 2.1886481 -0.1775473 -0.1852753 -2.5065362 -0.5573113 -0.1435595
希望有所帮助!
在研究boot
函数给你一个例子时,我遇到了麻烦。它只返回一行。奇怪!我可能会对此提出一个新问题。无论如何,我认为bootstrap()
包中的bootstrap
函数可以满足您的需求。
这是我的例子
set.seed(1001)
X <- rnorm(600, 7, 2)
myStat <- function(x, pairs) {
index = sample(1:length(x),(pairs*2))
Z = cor(X[index[1:(length(index)/2)]], X[index[((length(index)/2)+1):length(index)]])
return(Z)
}
b=bootstrap(X,1000,myStat,pairs=20)
Z <- b$thetastar
error <- qt(0.975, length(Z)-1 * sd(Z)/sqrt(length(Z)))