如何使用fitdistr确定左截断分布的weibull参数?

时间:2015-01-22 17:34:09

标签: r weibull

我希望找到使用R< fitdistr函数(MLE)截断的分布的weibull形状和比例参数。使用树木直径数据样本(最小值为2.8):

data<-c(42.7,18.8,30.0,20.3,32.5,18.8,16.0,42.9,18.8,17.3,21.1,23.4,15.0,16.8,15.2,15.0,14.7,17.3,20.1,18.3,16.0,15.7,21.3,
    19.1,17.3,17.0,17.3,17.5,21.6,15.7,12.7,13.2,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,3.6,2.8,2.8,2.8,2.8,2.8,
    2.8,2.8,2.8,2.8,2.8,2.8,2.8,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,12.2,4.3,4.3,4.3,4.3,4.3,4.3,
    4.3,4.3,4.3,4.3,4.3,4.3,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,2.8,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.6,
    5.6,5.6,18.0,16.3,34.8,17.5,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,6.3,
    6.3,6.3,6.3,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,9.4,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8,3.8)

library(MASS)
wb<-fitdistr(data,'weibull',lower=0.1) # MLE for weibull parameter determination
wb

结果: 形状刻度
1.36605920 9.97356797

给定数据分布,可以预期负单调曲线(例如形状<1)。然而,这些结果表明形状> 1,因为fitdistr没有考虑数据被截断的事实。在其他地方,建议如下:

ltwei<-function(x,shape,scale=1,log=FALSE){dweibull(x,shape,scale,log)/pweibull(1,shape,scale,lower=FALSE) } 
ltweifit<-fitdistr(data,ltwei,start=list(shape=1,scale=10)) 
ltweifit 

但这会导致更大的威布尔形状值。如何为将数据的左截断考虑在内的分布生成形状和比例参数? 非常感谢提前。

1 个答案:

答案 0 :(得分:1)

这是对给出的建议的修改,因为我认为分母是不正确的。试图将分母更改为1-pweibull(trunc, ...)看看这是否会产生更好的拟合:

> ptwei <- function(x, shape, scale , log = FALSE) 
+               pweibull(x, shape, scale, log)/(1-
+                 pweibull(2.8, shape, scale, lower=FALSE))
> dtwei <- function(x, shape, scale , log = FALSE) 
+               dweibull(x, shape, scale, log)/(1-
+                 pweibull(2.8, shape, scale, lower=FALSE))
> fitdist(data, dtwei, start=list(shape=1.2, scale=4))
Fitting of the distribution ' twei ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape 0.4555414 0.02660275
scale 0.8751571 0.12236256

我现在意识到Ripley教授正在使用较低的参数来实现我上面所做的事情,所以只要使用ltwei调用lower = FALSE函数,原始代码就会起作用

-----------

这是遵循Brian Ripley教授在2008年10月7日发布的Rhelp上发表的建议:

ltwei <- function(x, shape, scale = 1, log = FALSE) 
              dweibull(x, shape, scale, log)/
                pweibull(2.8, shape, scale, lower=FALSE)

通过在截断点除以CMF来标准化威布尔密度。 (使用lower = FALSE时,pweibull函数会将截断点的积分密度返回到Inf。)

library(MASS)
wb<-fitdistr(data,ltwei,start=list(shape=1,scale=1) ) 

There were 50 or more warnings (use warnings() to see the first 50)
> wb
      shape         scale   
   1.81253163   12.72912552 
 ( 0.07199877) ( 0.54855731)
> warnings()[1:5]
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
3: In dweibull(x, shape, scale, log) : NaNs produced
4: In pweibull(2.8, shape, scale, lower = FALSE) : NaNs produced
5: In dweibull(x, shape, scale, log) : NaNs produced
> library(MASS)
> wb<-fitdistr(data,ltwei,start=list(shape=1.1,scale=10) ) 
> wb
      shape         scale   
   1.81253113   12.72912870 
 ( 0.07199877) ( 0.54855782)

您可以看到一些起始值生成警告,但算法仍然成功。如果您从更接近“真实值”的值开始,则不会出现警告。我担心你的期望没有得到满足。有时对Weibull分布的参数化存在困惑,因为关于它们的处理方式有不同的约定。这就是Terry Therneau的生存::幸存帮助页面:

# There are multiple ways to parameterize a Weibull distribution. The survreg 
# function imbeds it in a general location-scale familiy, which is a 
# different parameterization than the rweibull function, and often leads
# to confusion.
#   survreg's scale  =    1/(rweibull shape)
#   survreg's intercept = log(rweibull scale)
#   For the log-likelihood all parameterizations lead to the same value.

由于您似乎对fitdistr的结果不满意,我还从'fitdistrplus'包中运行了fitdist并获得了相同的答案。我仍然认为您需要检查参数的解释,并且可能还需要检验这必然是Weibull分布的数据:

 dtwei <- function(x, shape, scale , log = FALSE) 
               dweibull(x, shape, scale, log)/
                 pweibull(2.8, shape, scale, lower=FALSE)

ptwei <- function(x, shape, scale , log = FALSE) 
               pweibull(x, shape, scale, log)/
                 pweibull(2.8, shape, scale, lower=FALSE)
fitdist(data, dtwei, start=list(shape=1.2, scale=4))
#----------------
Fitting of the distribution ' twei ' by maximum likelihood 
Parameters:
       estimate Std. Error
shape  1.812706 0.07199987
scale 12.728087 0.54839034