保存NLTK HMM时出错

时间:2016-03-09 04:01:34

标签: python python-2.7 nltk hidden-markov-models

我试图用Pickle将NLMK的HMM Tagger保存如下。但它给我的错误如下, 请给我一个解决方案。

>>> import nltk
>>> import pickle
>>> brown_a = nltk.corpus.brown.tagged_sents()[:300]
>>> hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
>>> sent = nltk.corpus.brown.sents()[400]
>>> hmm_tagger.tag(sent)
[(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
>>> f = open('my_tagger.pickle', 'wb')
>>> pickle.dump(hmm_tagger, f)

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    pickle.dump(hmm_tagger, f)
  File "C:\Python27\lib\pickle.py", line 1376, in dump
    Pickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 754, in save_global
    (obj, module, name))
PicklingError: Can't pickle <function estimator at 0x0575F6F0>: it's not found as nltk.tag.hmm.estimator
>>> 

我在MS-Windows10上使用NLTK3.1进行Python2.7.11。

先谢谢。

1 个答案:

答案 0 :(得分:0)

你为什么要腌制模特?对棕色语料库的训练非常快。如果你想要一个更好的词性标注器,考虑查看易于使用的https://spacy.io/,它具有很好的酸洗支持并产生最先进的结果。实际上,HMM标签现在非常糟糕。

无论如何,这是一个NLTK错误。三个选项:

  1. 通过移动_train函数之外的估算函数将错误报告给NLTK和/或修复它以放入模块(以便pickle可以在nltk.tag.hmm.estimator中找到它
  2. 提供您自己的估算函数,以便pickle在您自己的模块中找到它
  3. 使用腌制替代品,如莳萝或云雀:他们可能能够处理这种估算功能。
  4. 以下是使用dill转储标记器的方法:

    import nltk
    import dill
    
    brown_a = nltk.corpus.brown.tagged_sents()[:300]
    hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
    sent = nltk.corpus.brown.sents()[400]
    hmm_tagger.tag(sent)
    # [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
    
    with open('my_tagger.dill', 'wb') as f:
        dill.dump(hmm_tagger, f)
    

    现在您可以加载标记器:

    import dill
    
    with open('my_tagger.dill', 'rb') as f:
        hmm_tagger = dill.load(f)
    
    hmm_tagger.tag(sent)
    # [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]