Question

为什么x和y的结果不同？

[1] 0 0 0 1
[1] 0.06 0.06 0.22 0.19

以下是代码（来自here）：

rands <- list()
set.seed(1)
rands[[1]] <- rnorm(10) + c(1,0,2,0,1)
rands[[2]] <- rnorm(100) + c(1,0,2,0,1)
rands[[3]] <- rnorm(1000) + c(1,0,2,0,1)
rands[[4]] <- rnorm(5000) + c(1,0,2,0,1)

x <- replicate(100, { # generates 100 different tests on each distribution
  c(shapiro.test(rands[[1]])$p.value,
    shapiro.test(rands[[2]])$p.value,
    shapiro.test(rands[[3]])$p.value,
    shapiro.test(rands[[4]])$p.value)}) # rnorm gives a random draw from the normal distribution

set.seed(1)
y <- replicate(100, { # generates 100 different tests on each distribution
  c(shapiro.test(rnorm(10) + c(1,0,2,0,1))$p.value,
    shapiro.test(rnorm(100) + c(1,0,2,0,1))$p.value,
    shapiro.test(rnorm(1000) + c(1,0,2,0,1))$p.value,
    shapiro.test(rnorm(1000) + c(1,0,2,0,1))$p.value)}) # rnorm gives a random draw from the normal distribution

print(rowMeans(x < 0.05)) # the proportion of significant deviations
print(rowMeans(y < 0.05)) # the proportion of significant deviations

我还检查了

class(rands[[1]]) # [1] "numeric"
class(rnorm(10) + c(1,0,2,0,1)) # [1] "numeric"

和例如

rands[[1]]
# [1] 0.3735462 0.1836433 1.1643714 1.5952808 1.3295078 0.1795316 0.4874291 2.7383247 0.5757814 0.6946116
set.seed(1)
rnorm(10) + c(1,0,2,0,1)
# [1] 0.3735462 0.1836433 1.1643714 1.5952808 1.3295078 0.1795316 0.4874291 2.7383247 0.5757814 0.6946116

我担心，我犯了一个研究错误？

Answer 1

为简化问题，请考虑以下因素：

set.seed(1)
rands[[1]] <- rnorm(10) + c(1,0,2,0,1)
x <- replicate(100, {shapiro.test(rands[[1]])$p.value
  })

y <- replicate(100, {set.seed(1); shapiro.test(rnorm(10) + c(1,0,2,0,1))$p.value
  })

all.equal(x,y)
[1] TRUE

因此，您看到要使两者相同，您需要强制y中的随机数相同。在x中，您已经对其进行了预先计算，这意味着您对同一数据运行了100次相同的测试，而在y中，您实际上每次都在绘制新的随机数。因此，很明显，正确的方法是您的版本y（不是我的版本，我在表达式中设置了种子）。

为什么创建列表会更改shapiro.test的结果？

1 个答案: