使用'mclust'软件包进行混合模型估计时出错

时间:2014-09-04 11:25:41

标签: r eda

尝试执行单变量混合模型估算时,Rmclust会生成以下带有错误的输出

----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm 
----------------------------------------------------

Mclust V (univariate, unequal variance) model with 9 components:

    log.likelihood     n               df               BIC              ICL
 -25576.3596860173 53940 26.0000000000001 -51436.0056895487 -84761.892259584

Clustering table:
    1     2     3     4     5     6     7     8     9 
 3081  8923  8177  3017  8742  6568 10378  3650  1404 
Error in object[as.character(G), modelNames, drop = FALSE] : 
  incorrect number of dimensions

可重复示例:

library(mclust)

set.seed(12345) # for reproducibility

data(diamonds, package='ggplot2')  # use built-in data
myData <- log10(diamonds$price)

# determine number of components
mc <- Mclust(myData)
print(summary(mc))

其他诊断信息:

> str(mc)
List of 15
 $ call          : language Mclust(data = myData)
 $ data          : num [1:53940, 1] 2.51 2.51 2.51 2.52 2.53 ...
 $ modelName     : chr "V"
 $ n             : int 53940
 $ d             : num 1
 $ G             : int 9
 $ BIC           : num [1:9, 1:2] -64689 -57330 -54802 -52807 -52721 ...
 ..- attr(*, "dimnames")=List of 2
 .. ..$ : chr [1:9] "1" "2" "3" "4" ...
 .. ..$ : chr [1:2] "E" "V"
 ..- attr(*, "G")= num [1:9] 1 2 3 4 5 6 7 8 9
 ..- attr(*, "modelNames")= chr [1:2] "E" "V"
 ..- attr(*, "oneD")= logi TRUE
 $ bic           : num -51436
 $ loglik        : num -25576
 $ df            : num 26
 $ hypvol        : num NA
 $ parameters    :List of 4
 ..$ Vinv    : NULL
 ..$ pro     : num [1:9] 0.0606 0.1558 0.1628 0.0411 0.1655 ...
 ..$ mean    : Named num [1:9] 2.7 2.86 3.03 3.24 3.39 ...
 .. ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
 ..$ variance:List of 5
 .. ..$ modelName: chr "V"
 .. ..$ d        : num 1
 .. ..$ G        : int 9
 .. ..$ sigmasq  : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
 .. ..$ scale    : num [1:9] 0.004915 0.006932 0.009595 0.000833 0.008896 ...
 $ classification: num [1:53940] 1 1 1 1 1 1 1 1 1 1 ...
 $ uncertainty   : num [1:53940] 0.0132 0.0132 0.0134 0.015 0.0152 ...
 $ z             : num [1:53940, 1:9] 0.987 0.987 0.987 0.985 0.985 ...
 ..- attr(*, "dimnames")=List of 2
 .. ..$ : NULL
 .. ..$ : NULL
 - attr(*, "class")= chr "Mclust"

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rebmix_2.6.1       ddst_1.03          evd_2.3-0          orthopolynom_1.0-5
 [5] polynom_1.3-8      mixtools_1.0.2     segmented_0.4-0.0  boot_1.3-11       
 [9] mclust_4.3         MASS_7.3-34       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.1      colorspace_1.2-4 digest_0.6.4     ggplot2_1.0.0   
 [5] grid_3.1.1       gtable_0.1.2     munsell_0.4.2    plyr_1.8.1      
 [9] proto_0.3-10     reshape2_1.4     scales_0.2.4     stringr_0.6.2   
[13] tools_3.1.1     

1 个答案:

答案 0 :(得分:1)

我已经弄明白自己出了什么问题。 错误消息由另一个mclust函数mclustModel()生成,我的代码在Mclust()调用后执行了该函数。该错误是由于mclustModel()函数需要将不同类型的对象作为第二个参数传递的事实。使用mclustModel()(而不是Mclust())时,正确的通话顺序如下所示。

mc <- mclustBIC(myData)
print(summary(mc, myData, parameters = TRUE))
plot(mc)

bestModel <- mclustModel(myData, mc)
print(summary(bestModel, myData))