使用R中的depmixS4拟合HMM时的NA / NaN / Inf误差

时间:2014-08-18 12:57:25

标签: r machine-learning statistics

我试图使用depmix在R中拟合简单的隐藏马尔可夫模型。但我有时会得到模糊的错误(外国函数调用中的Na / NaN / Inf)。例如

 require(depmixS4)
 t = data.frame(v=c(0.0622031327669583,-0.12564002739468,-0.117354660120178,0.0115062213361335,0.122992418345013,-0.0177816909620965,0.0164821157439354,0.161981367176501,-0.174367935386872,0.00429417498601576,0.00870091566593177,-0.00324734222267713,-0.0609817740148078,0.0840679943325736,-0.0722982123741866,0.00309386232501072,0.0136237132601905,-0.0569072400881981,0.102323872007477,-0.0390675463642003,0.0373248728294635,-0.0839484669503484,0.0514620475651086,-0.0306598076180909,-0.0664992242224042,0.826857872461293,-0.172970803143762,-0.071091459861684,-0.0128631184461384,-0.0439382422065227,-0.0552809574423446,0.0596321725192134,-0.06043926984848,0.0398700063815422))
 mod = depmix(response=v~1, data=t, nstates=2)
 fit(mod)
 ...
 NA/NaN/Inf in foreign function call (arg 10)

我可以输入几乎相同的大小和复杂度的工作正常......在这里有depmixS4的首选工具吗?

1 个答案:

答案 0 :(得分:2)

无法保证EM算法能够在给定任意数量的状态的情况下找到适合每个数据集的数据。例如,您是否尝试将2状态高斯模型拟合到由λ= 1的泊松分布生成的数据,您将收到相同的误差。

set.seed(3)
ydf <- data.frame(y=rpois(100,1))    
m1 <- depmix(y~1,ns=2,family=gaussian(),data=ydf)
fit(m1)

iteration 0 logLik: -135.6268 
iteration 5 logLik: -134.2392 
iteration 10 logLik: -128.7834 
iteration 15 logLik: -111.5922 
Error in fb(init = init, A = trDens, B = dens, ntimes = ntimes(object),  : 
  NA/NaN/Inf in foreign function call (arg 10)

关于您的数据,您可以使用1状态将模型拟合到您的数据中。有2个状态,算法无法找到解决方案(即使有10000个随机启动)。对于3个状态,问题似乎是初始化模型的起始状态。如果尝试使用您提供的数据运行相同模型100次,则会在100次迭代中获得收敛。示例如下:

 >require(depmixS4)
 >t = data.frame(v=c(0.0622031327669583,-0.12564002739468,-0.117354660120178,0.0115062213361335,0.122992418345013,-0.0177816909620965,0.0164821157439354,0.161981367176501,-0.174367935386872,0.00429417498601576,0.00870091566593177,-0.00324734222267713,-0.0609817740148078,0.0840679943325736,-0.0722982123741866,0.00309386232501072,0.0136237132601905,-0.0569072400881981,0.102323872007477,-0.0390675463642003,0.0373248728294635,-0.0839484669503484,0.0514620475651086,-0.0306598076180909,-0.0664992242224042,0.826857872461293,-0.172970803143762,-0.071091459861684,-0.0128631184461384,-0.0439382422065227,-0.0552809574423446,0.0596321725192134,-0.06043926984848,0.0398700063815422))
 >mod = depmix(response=v~1, data=t, nstates=2)
 >fit(mod)
 ...
 NA/NaN/Inf in foreign function call (arg 10)

>replicate(100, try(fit(mod, verbose = F)))

[[1]]
[1] "Error in fb(init = init, A = trDens, B = dens, ntimes = ntimes(object),  : \n  NA/NaN/Inf in foreign function call (arg 10)\n"

[[2]]
[1] "Error in fb(init = init, A = trDens, B = dens, ntimes = ntimes(object),  : \n  NA/NaN/Inf in foreign function call (arg 10)\n"

[[3]]
Convergence info: Log likelihood converged to within tol. (relative change) 
'log Lik.' 34.0344 (df=14)
AIC:  -40.0688 
BIC:  -18.69975 
... output truncated