根据下面的代码,我试图在R中使用flexmix包来拟合3泊松分布的混合:
require(flexmix)
freq<- c(222950,111682,72429,48126,34515,25801,19199,15033,11859, 9226,
7363, 5910, 4659, 3723, 2985, 2291,1907, 1447,
1265,891,722,620,546,439,359,286,255,236,208,148,176,145,
151,135,102, 92,136, 99,102, 92, 81, 85, 71, 84, 58, 78, 59, 66 , 47, 48,
42, 58, 43, 38, 34, 45, 21, 28, 32, 36, 27, 22, 26, 31 ,
20, 16, 12, 19, 19, 15, 18, 17,8,8, 12, 18, 10,6,5,8,9,4,7,5,8,
10,6,3,7,2,4,3,4,6,6,5,
3,3,6,4,4,4,2,2,3,1,2,1,3,4,2,3)
zz<- as.data.frame(rep(0:111,times=freq))
colnames(zz)<- "xx"
flexfit1<- flexmix(xx ~ 1, data = zz, k = 3, model = FLXglm(family =
"poisson"))
summary( flexfit1)
我得到以下结果:
Call:
flexmix(formula = xx ~ 1, data = zz, k = 3, model = FLXglm(family =
"poisson"))
prior size post>0 ratio
Comp.1 0.796 489702 561594 0.872
Comp.2 0.204 120115 386867 0.310
'log Lik.' -1465654 (df=3)
AIC: 2931315 BIC: 2931349
我在k=3
电话中提及flexmix
。为什么摘要中只有两个组件?另外,为什么log对象的df只有3?
我尝试用以下一些随机数据复制上述内容:
zz<- as.data.frame(sample(0:111,10^4,replace=T))
colnames(zz)<- "xx"
flexfit1<- flexmix(xx ~ 1, data = zz, k = 3, model = FLXglm(family =
"poisson"))
summary( flexfit1)
我得到以下输出:
prior size post>0 ratio
Comp.1 0.458 4614 5737 0.804
Comp.2 0.332 3259 5117 0.637
Comp.3 0.211 2127 2759 0.771
'log Lik.' -53488.65 (df=5)
AIC: 106987.3 BIC: 107023.4
在这里,我可以看到3个组件,也是自由度是5.然后,为什么我在第一种情况下得不到类似的结果?
在第一种情况下,我将拟合模型的均值和方差与数据进行了比较。以下代码给出了模型参数 -
refit1 <- refit(flexfit1)
summary(refit1)
这给出了以下输出 -
$Comp.1
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.0962269 0.0019745 48.735 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
$Comp.2
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.2364497 0.0013964 1601.5 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
现在我们可以计算 - 拟合模型的平均值= p1 * m1 + p2 * m2 = 0.796 * exp(0.0962269)+ 0.204 * exp(2.2364497) = 2.785851,非常接近数据均值(即均值(zz [,1])),即2.778745。
拟合模型的方差= p1 * m1 + p2 * m2 + p1 * p2 *(m1-m2)^ 2 = 0.796 * exp(0.0962269)+ 0.204 * exp(2.2364497)+ 0.796 * 0.204 *(exp(0.0962269) )-exp(2.2364497))^ 2 = 13.86233,它远小于数据的方差(即var(zz [,1]),即23.4328。
如此直观地说,似乎要解释方差,我们应该采用两个以上泊松分布的混合。
请让我知道你的想法