为表格数据拟合2参数weibull分布

时间:2013-05-04 22:15:35

标签: r

我正在尝试将weibull分布调整为一个列表数据。处理完我的点云之后,我会在每个1米高的切片中获得返回数量的列。例如:

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)

在上面的例子中,中心13.5米的高度等级里面有7个点。

如果我绘制矩阵a,可以可视化数据分布:

barplot(a)

enter image description here

是否有人建议如何将weibull 2-paramters与表格数据相匹配?

提前致谢!

2 个答案:

答案 0 :(得分:6)

您可以对删失的数据进行最大可能性。

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,
                 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)


centers <- as.numeric(colnames(a))
low <- centers - .5
up <- centers + .5

ll.weibullCensored <- function(par, dat){
    shape <- par[1]
    scale <- par[2]
    # Get the probability for each 'bin' and take the log
    log.ps <- log(pweibull(up, shape, scale) - pweibull(low, shape, scale))
    # Sum the logs of the bin probabilities as many times
    # as they should be as dictated by the data
    sum(rep(log.ps, dat))
}

# Use optim or any other function to find a set
# of parameters that maximizes the log likelihood
o.optim <- optim(c(9, 28), 
                 ll.weibullCensored, 
                 dat = as.numeric(a), 
                 # this tells it to find max instead of a min
                 control=list(fnscale=-1))  

这给出了与AndresT基本相同的估计,但他们的方法是假设所有数据都落在区间的中心并在该插补数据集上执行最大似然。它并没有太大的区别,但使用这种方法你不一定需要其他软件包。

编辑:如果我们看看我们为每种方法最大化的内容,那么AndresT的解决方案和我的估算值非常类似的事实很有意义。在我的作品中,我们正在研究落入每个“箱子”的可能性。 AndreT的解决方案使用箱子中心的分布密度来代替这种概率。我们可以看一下落入垃圾箱的概率与垃圾箱中心的密度值之间的比例(使用从我的解决方案中获得的形状和尺度),得出:

# Probability of each bin
> ps
 [1] 0.0005495886 0.0009989085 0.0017438767 0.0029375471 0.0047912909
 [6] 0.0075863200 0.0116800323 0.0174991532 0.0255061344 0.0361186335
[11] 0.0495572085 0.0656015797 0.0832660955 0.1004801353 0.1139855466
[16] 0.1197890284 0.1144657811 0.0971503491 0.0711370586 0.0433654456
[21] 0.0210758647 0.0077516837 0.0020274896
# Density evaluated at the center of the bin
> ps.cent
 [1] 0.0005418957 0.0009868040 0.0017254545 0.0029103746 0.0047524364
 [6] 0.0075325510 0.0116083397 0.0174078328 0.0253967142 0.0359988789
[11] 0.0494450583 0.0655288551 0.0832789134 0.1006305707 0.1143085230
[16] 0.1202647955 0.1149865305 0.0975322358 0.0712125315 0.0431169222
[21] 0.0206762531 0.0074246320 0.0018651941
# Ratio of the probability and the density
> ps/ps.cent
 [1] 1.0141963 1.0122663 1.0106767 1.0093364 1.0081757 1.0071382 1.0061760
 [8] 1.0052459 1.0043084 1.0033266 1.0022682 1.0011098 0.9998461 0.9985051
[15] 0.9971745 0.9960440 0.9954712 0.9960845 0.9989402 1.0057639 1.0193271
[22] 1.0440495 1.0870127

所有这些比率接近1 - 所以这两种方法基本上都试图最大化相同的可能性。

答案 1 :(得分:2)

我确信有一种方法可以更好地重塑,但这可能会有效;

library('fitdistrplus')
    library('reshape2')



    a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
    colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,
                     28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)

    barplot(a)

    a2 = melt(a)
    a3= (rep(a2[,2],a2[,3]))

    fitdist(a3, "weibull")

descdist(a3,boot=5000)