我在R中使用fitdistrplus包。到目前为止,我已经确定了适当的分布(使用vignette和' groundbeef'数据):
> library(fitdistrplus)
> data("groundbeef")
> str(groundbeef)
'data.frame': 254 obs. of 1 variable:
$ serving: num 30 10 20 24 20 24 40 20 50 30 ...
> descdist(groundbeef$serving, boot = 1000)
summary statistics
------
min: 10 max: 200
median: 79
mean: 73.64567
estimated sd: 35.88487
estimated skewness: 0.7352745
estimated kurtosis: 3.551384
> fw<-fitdist(groundbeef$serving, "weibull")
> fg<-fitdist(groundbeef$serving, "gamma")
> fln<-fitdist(groundbeef$serving, "lnorm")
> gofstat(list(fw, fg, fln), fitnames = c("weibull", "gamma", "lnorm"))
Goodness-of-fit statistics
weibull gamma lnorm
Kolmogorov-Smirnov statistic 0.1396646 0.1281246 0.1493090
Cramer-von Mises statistic 0.6840994 0.6934112 0.8277358
Anderson-Darling statistic 3.5736460 3.5660192 4.5436542
Goodness-of-fit criteria
weibull gamma lnorm
Aikake's Information Criterion 2514.449 2511.250 2526.639
Bayesian Information Criterion 2521.524 2518.325 2533.713
基于此,我可以选择伽玛分布来描述数据。我现在想做的是看看数据的不同样本大小如何影响gof。例如,当前gof基于254个观察值 - 如果我只使用其中50个观察值的随机样本,gof如何变化?在某些时候,必须存在伽马不再适合的阈值(即,这些分布不能最好地描述1的样本)。 我看了this,这或多或少是我希望做的,除了只有一个数据组(这是&#39; groundbeef $ serving&#39;)我对此不感兴趣计算功率但跟踪p值如何变化,因为我使用不同的样本量。
答案 0 :(得分:1)
从原始数据集中随机抽样大小N 1000次,测试每个随机样本的拟合优度,看看1000 gof测试的平均p值是多少。
df <- groundbeef
for (i in 1:1000) {
temp2 <- sample(df$serving, size=50, replace=FALSE)
#descdist(temp2, boot = 1000)
fw <- fitdist(temp2, "weibull")
fg <- fitdist(temp2, "gamma")
fln <- fitdist(temp2, "lnorm")
gof <- gofstat(list(fw, fg, fln), fitnames = c("weibull", "gamma", "lnorm"))$chisqpvalue
if(i == 1) { results <- gof }
if(i > 1) { results <- rbind(results, gof) }
}
row.names(results) <- c(1:nrow(results))
results <- as.data.frame(results)
summary(results)
weibull gamma lnorm
Min. :0.0000000 Min. :0.0000000 Min. :0.000000
1st Qu.:0.0000001 1st Qu.:0.0000001 1st Qu.:0.000000
Median :0.0009940 Median :0.0035025 Median :0.003264
Mean :0.0380086 Mean :0.0519209 Mean :0.058692
3rd Qu.:0.0383365 3rd Qu.:0.0578076 3rd Qu.:0.056701
Max. :0.7309149 Max. :0.8963196 Max. :0.855437