为什么x
和y
的结果不同?
[1] 0 0 0 1
[1] 0.06 0.06 0.22 0.19
以下是代码(来自here):
rands <- list()
set.seed(1)
rands[[1]] <- rnorm(10) + c(1,0,2,0,1)
rands[[2]] <- rnorm(100) + c(1,0,2,0,1)
rands[[3]] <- rnorm(1000) + c(1,0,2,0,1)
rands[[4]] <- rnorm(5000) + c(1,0,2,0,1)
x <- replicate(100, { # generates 100 different tests on each distribution
c(shapiro.test(rands[[1]])$p.value,
shapiro.test(rands[[2]])$p.value,
shapiro.test(rands[[3]])$p.value,
shapiro.test(rands[[4]])$p.value)}) # rnorm gives a random draw from the normal distribution
set.seed(1)
y <- replicate(100, { # generates 100 different tests on each distribution
c(shapiro.test(rnorm(10) + c(1,0,2,0,1))$p.value,
shapiro.test(rnorm(100) + c(1,0,2,0,1))$p.value,
shapiro.test(rnorm(1000) + c(1,0,2,0,1))$p.value,
shapiro.test(rnorm(1000) + c(1,0,2,0,1))$p.value)}) # rnorm gives a random draw from the normal distribution
print(rowMeans(x < 0.05)) # the proportion of significant deviations
print(rowMeans(y < 0.05)) # the proportion of significant deviations
我还检查了
class(rands[[1]]) # [1] "numeric"
class(rnorm(10) + c(1,0,2,0,1)) # [1] "numeric"
和例如
rands[[1]]
# [1] 0.3735462 0.1836433 1.1643714 1.5952808 1.3295078 0.1795316 0.4874291 2.7383247 0.5757814 0.6946116
set.seed(1)
rnorm(10) + c(1,0,2,0,1)
# [1] 0.3735462 0.1836433 1.1643714 1.5952808 1.3295078 0.1795316 0.4874291 2.7383247 0.5757814 0.6946116
我担心,我犯了一个研究错误?
答案 0 :(得分:1)
为简化问题,请考虑以下因素:
set.seed(1)
rands[[1]] <- rnorm(10) + c(1,0,2,0,1)
x <- replicate(100, {shapiro.test(rands[[1]])$p.value
})
y <- replicate(100, {set.seed(1); shapiro.test(rnorm(10) + c(1,0,2,0,1))$p.value
})
all.equal(x,y)
[1] TRUE
因此,您看到要使两者相同,您需要强制y
中的随机数相同。在x
中,您已经对其进行了预先计算,这意味着您对同一数据运行了100次相同的测试,而在y
中,您实际上每次都在绘制新的随机数。因此,很明显,正确的方法是您的版本y
(不是我的版本,我在表达式中设置了种子)。