在Python中使用hmmlearn学习字符序列

时间:2017-08-14 18:31:45

标签: python hidden-markov-models hmmlearn

这是我的问题,我正在尝试使用hmmlearn教授隐马尔可夫模型。我是这门语言的新手,我很难理解列表和数组之间的区别。这是我的代码:

from hmmlearn import hmm
from babel import lists
import numpy as np
import unidecode as u
from numpy import char

l = []
data = []
gods_egypt = ["Amon","Anat","Anouket","Anubis","Apis","Atoum","Bastet","Bès","Gheb","Hâpy","Harmachis","Hathor","Heh","Héket","Horus","Isis","Ka","Khepri","Khonsou","Khnoum","Maât","Meresger","Mout","Nefertoum","Neith","Nekhbet","Nephtys","Nout","Onouris","Osiris","Ouadjet","Oupaout","Ptah","Rê","Rechef","Renenoutet","Satet","Sebek","Sekhmet","Selkis","Seth","Shou","Sokaris","Tatenen","Tefnout","Thot","Thouéris"]
for i in range(0, len(gods_egypt)):
    data.append([])
    for j in range(0, len(gods_egypt[i])):
        data[i].append([u.unidecode(gods_egypt[i][j].lower())])
    l.append(len(data[i]))
data = np.asarray(data).reshape(-1,1)
model = hmm.MultinomialHMM(20, verbose=True)
model = model.fit(data, l)

和结果输出

Traceback (most recent call last):
  File "~~~\HMM_test.py", line 17, in <module>
    model = model.fit(data, l)
  File "~~~\Python\Python36\site-packages\hmmlearn\base.py", line 420, in fit
    X = check_array(X)
  File "~~~\Python36-32\lib\site-packages\sklearn\utils\validation.py", line 402, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.

我在ValueError: setting an array element with a sequence看到它可能是一个不同数组长度的问题,但我无法弄清楚如何解决它。

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

错误本身来自model.fit()期待数值数组的事实。现在您的输入data是一个包含字符串列表的数组数组。这是因为函数发现它期望array element的{​​{1}},即列表(字符串列表),就会引发错误。

但是,即使您修复了列表问题,也会出现另一个问题: 学习HMM意味着通过一组方程计算数值。学习HMM的输入数据应该是数字的,而不是一组字母。 (除非is a sequence对我不知道的字符有非常特殊的选项,否则。)

如果您想使用HMM,则需要先将字母转换为数字。

我不知道你的目标是什么。 HMM旨在为数据建模或分类目的(如果训练了几个HMM)。一旦你从组成单词的字母中获得训练有素的模型,你打算做什么?

至于应该向不同功能提供数据的格式,我建议你看一下documentation。它包括使用该库的教程。