HMMlearn GMMHMM错误

时间:2016-05-08 02:10:14

标签: python debugging hmmlearn

我正在尝试初始化几个GMM,以便与GMMHMM的gmms_属性一起使用。每个GMM实例具有不同的平均值,重量和协方差,并且作为GMMHMM的5组分混合物的组分。平均值,权重和协方差是根据我想要拟合的数据集的(5聚类)k均值算法确定的,其中均值是每个聚类的中心,权重是每个聚类的权重和共同方差是 - 你猜对了 - 每个聚类的协方差。

以下是代码段:

X_clusters = cls.KMeans(n_clusters=5)
fitted_X = X_clusters.fit(X)
means = fitted_X.cluster_centers_
cluster_arrays = extract_feat(X, fitted_X.labels_)
print ('Means: {0}'.format(means))

total_cluster = float(len(X)) 
all_GMM_params = []
for cluster in cluster_arrays:
    GMM_params = []
    weight = float(len(cluster))/total_cluster
    covar = np.cov(cluster)
    GMM_params.append(weight)
    GMM_params.append(covar)
    all_GMM_params.append(GMM_params)

for i in range(len(means)):
    all_GMM_params[i].append(means[i])


model = GMMHMM(n_components=4, covariance_type="diag", n_iter=1000,
            n_mix = 5, algorithm='map')

for i in range(len(all_GMM_params)):
    GMM_n = mix.GMM(init_params = '')
    GMM_n.weights_ = np.array(all_GMM_params[i][0])
    GMM_n.covars_ = np.array(all_GMM_params[i][1])
    GMM_n.means_ = np.array(all_GMM_params[i][2])
    model.gmms_.append(GMM_n)

model.fit(X)

但是,当我尝试拟合模型时,我收到以下错误:

fitting to HMM and decoding ...Traceback (most recent call last):
  File "HMM_stock_sim.py", line 156, in <module>
    model.fit(X)
  File "C:\Python27\lib\site-packages\hmmlearn\base.py", line 436, in fit
    bwdlattice)
  File "C:\Python27\lib\site-packages\hmmlearn\hmm.py", line 590, in _accumulate
_sufficient_statistics
    stats, X, framelogprob, posteriors, fwdlattice, bwdlattice)
  File "C:\Python27\lib\site-packages\hmmlearn\base.py", line 614, in _accumulat
e_sufficient_statistics
    stats['start'] += posteriors[0]
ValueError: operands could not be broadcast together with shapes (4,) (9,) (4,)

我之前从未见过这样的错误,这是我第一次使用sklearn和HMMlearn。我该如何解决此错误?

1 个答案:

答案 0 :(得分:1)

我能够使用来自双组分高斯混合物的随机样本重现该问题:

import numpy as np

X = np.append(np.random.normal(0, size=1024),
              np.random.normal(4, size=1024))[:, np.newaxis]

所以我的看法是为什么你的代码不起作用。 np.cov将给定数组的每一行视为变量。因此,对于形状(N, 1)的数组,输出必须具有(N, N)形状。显然,这不是你想要的,因为1-D高斯的协方差矩阵只是一个标量。

解决方案是在将cluster传递给np.cov之前转置np.cov(cluster.T) # has shape () aka scalar

X

在切换到3-D n_mix之后,我发现了另外两个问题:

  • GMMn_components中的组件数,而n_components=4是指马尔可夫链状态的数量(或等效的混合数量)。请注意,您将GMMHMM传递给GMM构造函数,然后将5 model.gmms_个实例附加到GMMHMM
  • 此外,model.gmms_预先填充了n_components + 5,因此您最终得到了(9, )而不是4个混合,这解释了# the updated parameter value. # vvvvvvvvvvvvvv model = GMMHMM(n_components=5, covariance_type="diag", n_iter=1000, n_mix=5, algorithm='map') # ^^^^^^^ # doesn't have to match n_components for i, GMM_n in enumerate(model.gmms_): GMM_n.weights_ = ... # Change the attributes of an existing instance # instead of appending a new one to ``model.gmms_``. 不匹配。

更新的代码:

O(nlogn)