Question

我对R不太好，我真的很感激任何帮助。

我运行了这个循环，我有11,303,044行的巨大结果向量。我有另一个矢量来自另一个尺寸为1681行的循环。

我希望运行chisq.test来比较他们的发行版。但由于它们的长度不同，所以它不起作用。

我尝试从11,303,044大小的矢量中取1681个大小的样本来匹配第2个矢量的大小长度，但每次运行时我得到的结果都不同chisq.test。

我在考虑将2个向量分成相等数量的区间。

让我们说

向量1：

temp.mat<-matrix((rnorm(11303044))^2, ncol=1) 
head(temp.mat)
dim(temp.mat)

vector2：

temp.mat<-matrix((rnorm(1681))^2, ncol=1) 
head(temp.mat)
dim(temp.mat)

如何以相等的间隔分割它们以产生相同长度的矢量？

谢谢:)

Answer 1

mat1<-matrix((rnorm(1130300))^2, ncol=1) # only one-tenth the size of your vector
smat=sample(mat1, 100000)                #and take only one-tenth of that
mat2<-matrix((rnorm(1681))^2, ncol=1)
qqplot(smat,mat2)                       #and repeat the sampling a few times

从统计角度来看，你看到的内容似乎很有趣。在“偏离平均值”的较高层次上，大样本总是偏离“良好的适应性”，这并不奇怪，因为它具有更多的真正极端值。

Answer 2

chisq.test是Pearson's chi-square test。它专为离散数据而设计，有两个输入向量，它会强制传递给因子的输入，并且它会测试独立性，而不是分布中的相等性。这意味着，例如，数据的顺序将产生影响。

> set.seed(123)
> x<-sample(5,10,T)
> y<-sample(5,10,T)
> chisq.test(x,y)

    Pearson's Chi-squared test

data:  x and y
X-squared = 18.3333, df = 16, p-value = 0.3047

Warning message:
In chisq.test(x, y) : Chi-squared approximation may be incorrect
> chisq.test(x,y[10:1])

    Pearson's Chi-squared test

data:  x and y[10:1]
X-squared = 16.5278, df = 16, p-value = 0.4168

Warning message:
In chisq.test(x, y[10:1]) : Chi-squared approximation may be incorrect

所以我不认为chisq.test是你想要的，因为它不会比较分布。也许尝试像ks.test这样的东西，它可以使用不同长度的向量和连续的数据。

> set.seed(123)
> x<-rnorm(2000)^2
> y<-rnorm(100000)^2
> ks.test(x,y)

    Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.0139, p-value = 0.8425
alternative hypothesis: two-sided

> ks.test(sqrt(x),y)

    Two-sample Kolmogorov-Smirnov test

data:  sqrt(x) and y
D = 0.1847, p-value < 2.2e-16
alternative hypothesis: two-sided

将大矢量拆分为R中的间隔

2 个答案: