我们在R e中使用var.test()
函数,例如:
T1<-rnorm(255,mean=1.432,sd=0.255)
T2<-rnorm(256,mean=1.485,sd=0.251)
var.test(T1,T2)
# F test to compare two variances
#
# data: T1 and T2
# F = 1.1027, num df = 254, denom df = 255, p-value = 0.436
# alternative hypothesis: true ratio of variances is not equal to 1
# 95 percent confidence interval:
# 0.8620164 1.4106568
# sample estimates:
# ratio of variances
# 1.102695
但是,当我们使用相同的数据重新运行测试时,会得到非常不同的结果,例如:
T1<-rnorm(255,mean=1.432,sd=0.255)
T2<-rnorm(256,mean=1.485,sd=0.251)
var.test(T1,T2)
# F test to compare two variances
#
# data: T1 and T2
# F = 0.79853, num df = 254, denom df = 255, p-value = 0.07334
# alternative hypothesis: true ratio of variances is not equal to 1
# 95 percent confidence interval:
# 0.6242396 1.0215441
# sample estimates:
# ratio of variances
# 0.7985297
为什么会这样?我们在做错什么吗?
我们有多个数据集需要分析,我们需要了解正在发生的事情。
答案 0 :(得分:2)
要使分析可重复,可以使用set.seed
,它指定R随机数生成器的种子。
set.seed(42) # set seed
T1 <- rnorm(255, mean=1.432, sd=0.255)
T2 <- rnorm(256, mean=1.485, sd=0.251)
var.test(T1, T2)
# same seed - same result
set.seed(42)
T1 <- rnorm(255, mean=1.432, sd=0.255)
T2 <- rnorm(256, mean=1.485, sd=0.251)
var.test(T1, T2)
# different seed - different result
set.seed(123)
T1 <- rnorm(255, mean=1.432, sd=0.255)
T2 <- rnorm(256, mean=1.485, sd=0.251)
var.test(T1, T2)