对于卡方拟合优度测试,您通常希望每个bin中的预期频率至少为5.如果给出预期频率的向量,我怎样才能在R中轻松完成?
我们举一个例子:
set.seed(1000) # Seed for reproducibility
y = rpois(150, 4) # Simulate from Poisson(4)
observed = table(c(y, 1:max(y))) - 1 # Table of observed frequencies
thetahat = mean(y) # Assuming Y ~ Poisson(theta), where theta is unknown, this is an estimate of theta
expected = dpois(as.numeric(names(observed)), thetahat)*150 # Simulate expected frequencies for Poisson(thetahat)
expected[length(expected)] = 150 - sum(expected[1:(length(expected) - 1)]) # Compute final bin as 150*P(Y >= max(y))
结果如下:
rbind(observed, "expected" = round(expected, 3))
0 1 2 3 4 5 6 7 8 9 10
observed 0.000 18.000 19.000 24.000 33.000 31.000 12.000 7.000 2.000 0.000 3.000
expected 3.098 12.019 23.316 30.156 29.251 22.699 14.679 8.136 3.946 1.701 0.999
观察/预期频率的最后一个矩阵就是你给出的。我想对上面的操作进行操作,以便具有预期频率<1的箱/单元。将5个与相邻的区间/单元组合在一起,使得得到的预期频率> = 5.在上面的例子中,上面将变为
1 2 3 4 5 6 7 8
observed 18.000 19.000 24.000 33.000 31.000 12.000 7.000 5.000
expected 15.117 23.316 30.156 29.251 22.699 14.679 8.136 6.646
对于典型的单峰分布(例如泊松,正常),通常情况下,分布的尾部将具有预期的频率<1。 5.
一种可能的解决方案是编写一个循环来迭代每一列,并且如果给定二进制位中的频率<1,则对相邻列进行分组。我能做到这一点,但是有更有效的方法吗?在对相邻列进行分组时,我希望外部列向内折叠。例如上面,
我目前的代码:
z = rbind(observed, "expected" = round(expected, 3))
nbins = ncol(z)
while(z[2, nbins] < 5){
z[, nbins - 1] = z[, nbins - 1] + z[, nbins]
z = z[, -nbins]
nbins = nbins - 1
}
nbins = 1
while(z[2, nbins] < 5){
z[, nbins + 1] = z[, nbins + 1] + z[, nbins]
z = z[, -nbins]
}
谢谢