从已知的R分布中抽样

时间:2015-11-13 03:44:20

标签: r

我在R中使用fitdistrplus包。到目前为止,我已经确定了适当的分布(使用vignette和' groundbeef'数据):

> library(fitdistrplus)
> data("groundbeef")
> str(groundbeef)
'data.frame':   254 obs. of  1 variable:
 $ serving: num  30 10 20 24 20 24 40 20 50 30 ...
> descdist(groundbeef$serving, boot = 1000)
summary statistics
------
min:  10   max:  200 
median:  79 
mean:  73.64567 
estimated sd:  35.88487 
estimated skewness:  0.7352745 
estimated kurtosis:  3.551384 
> fw<-fitdist(groundbeef$serving, "weibull")
> fg<-fitdist(groundbeef$serving, "gamma")
> fln<-fitdist(groundbeef$serving, "lnorm")
> gofstat(list(fw, fg, fln), fitnames = c("weibull", "gamma", "lnorm"))
Goodness-of-fit statistics
                               weibull     gamma     lnorm
Kolmogorov-Smirnov statistic 0.1396646 0.1281246 0.1493090
Cramer-von Mises statistic   0.6840994 0.6934112 0.8277358
Anderson-Darling statistic   3.5736460 3.5660192 4.5436542

Goodness-of-fit criteria
                                weibull    gamma    lnorm
Aikake's Information Criterion 2514.449 2511.250 2526.639
Bayesian Information Criterion 2521.524 2518.325 2533.713

基于此,我可以选择伽玛分布来描述数据。我现在想做的是看看数据的不同样本大小如何影响gof。例如,当前gof基于254个观察值 - 如果我只使用其中50个观察值的随机样本,gof如何变化?在某些时候,必须存在伽马不再适合的阈值(即,这些分布不能最好地描述1的样本)。 我看了this,这或多或少是我希望做的,除了只有一个数据组(这是&#39; groundbeef $ serving&#39;)我对此不感兴趣计算功率但跟踪p值如何变化,因为我使用不同的样本量。

1 个答案:

答案 0 :(得分:1)

从原始数据集中随机抽样大小N 1000次,测试每个随机样本的拟合优度,看看1000 gof测试的平均p值是多少。

df <- groundbeef

for (i in 1:1000) {

    temp2 <- sample(df$serving, size=50, replace=FALSE)
    #descdist(temp2, boot = 1000)
    fw <- fitdist(temp2, "weibull")
    fg <- fitdist(temp2, "gamma")
    fln <- fitdist(temp2, "lnorm")
    gof <- gofstat(list(fw, fg, fln), fitnames = c("weibull", "gamma", "lnorm"))$chisqpvalue

    if(i == 1) { results <- gof }
    if(i > 1) { results <- rbind(results, gof) }

}

row.names(results) <- c(1:nrow(results))
results <- as.data.frame(results)
summary(results)

 weibull              gamma               lnorm         
Min.   :0.0000000   Min.   :0.0000000   Min.   :0.000000  
1st Qu.:0.0000001   1st Qu.:0.0000001   1st Qu.:0.000000  
Median :0.0009940   Median :0.0035025   Median :0.003264  
Mean   :0.0380086   Mean   :0.0519209   Mean   :0.058692  
3rd Qu.:0.0383365   3rd Qu.:0.0578076   3rd Qu.:0.056701  
Max.   :0.7309149   Max.   :0.8963196   Max.   :0.855437